Fachgebiet Datenbanken und InformationssystemeAbschlussarbeiten
Performance Benchmarking of Database Management Systems

PERFORMANCE BENCHMARKING OF DATABASE MANAGEMENT SYSTEMS [BSC]

Context: 

Data discovery consists of several tasks, such as finding joinable [3, 5] or unionable tables [6], finding columns/tables that fulfill a specific property, or answer a specific information need.

For each type of tasks and corresponding queries, there might be a different DBMS systems that is most suited in terms of response time and capability [1]. The response time is affected by the type of indexes, the types of queries, and the degree of parallelization that the storage system allows. There are many different DBMSs that offer varying performance in different scenarios and new DBMSs are introduced every year to solve new needs in databases, such as graph-based databases [2].

Problem / Task:  

In this thesis, we would like to evaluate three main DBMSs, including Vertica [4] and Postgres, for data discovery related tasks. The student is expected to define a benchmark with a wide variety of SQL queries representing different tasks to compare the DBMSs in the existence of large data sets. The student should design the experiments in way to show the impact of relevant parameters, such as, indexes, query formulation, and parallelization, on the performance. The performance of the systems should be evaluated with regard to the different components that lead to the response time, that includes the query execution and the fetch time, and. As the result of the thesis we expect a detailed and deep review of the selected technologies and their potentials and limitations for the given tasks. 

Prerequisites:

  • Strong fundamentals in database concepts and ability to write complex SQL queries.
  • Programming experience in Python, Scala, or Java.
  • Interest in data integration and courage to work with large data. 

Related Work:

[1] Gupta, Adity, et al. "NoSQL databases: Critical analysis and comparison." 2017 International Conference on Computing and Communication Technologies for Smart Nation (IC3TSN). IEEE, 2017.

[2] Pokorný, Jaroslav. "Graph databases: their power and limitations." IFIP International Conference on Computer Information Systems and Industrial Management. Springer, Cham, 2015.

[3] Esmailoghli, Mahdi, Jorge-Arnulfo Quiané-Ruiz, and Ziawasch Abedjan. "MATE: Multi-Attribute Table Extraction." arXiv preprint arXiv:2110.00318 (2021).

[4] Lamb, Andrew, et al. "The vertica analytic database: C-store 7 years later." arXiv preprint arXiv:1208.4173 (2012).

[5] Esmailoghli, Mahdi, Jorge-Arnulfo Quiané-Ruiz, and Ziawasch Abedjan. "COCOA: COrrelation COefficient-Aware Data Augmentation." EDBT. 2021.

[6] Nargesian, Fatemeh, et al. "Table union search on open data." Proceedings of the VLDB Endowment 11.7 (2018): 813-825. 

For a detailed introduction to the topic, please get in contact via email.

Advisor and Contact:

Mahdi Esmailoghli <esmailoghli@dbs.uni-hannover.de> (LUH)

Prof. Dr. Ziawasch Abedjan <abedjan@dbs.uni-hannover.de> (LUH)