Fachgebiet Datenbanken und InformationssystemeAbschlussarbeiten
Scalable Automated Feature Engineering [MSc]

SCALABLE AUTOMATED FEATURE ENGINEERING [MSC]

Context: 

Feature engineering is critical to successful ML applications. Feature engineering consists of two phases, feature construction and feature selection. Feature construction creates new features based on provided ones. Feature selection prunes the number of features. As feature engineering is a cumbersome task, many systems have been proposed to automate this process [1, 2]. Recent systems [2, 3] even incorporate additional quality dimensions, such as fairness, to construct and select features that are beneficial across dimensions. However, one major disadvantage of automated feature construction methods is the large runtime of these methods.

Problem / Task:  

The task is to develop an automated feature engineering system that is scalable and much faster than state-of-the-art approaches. The challenge is the large number of constructed features. So, one has to either develop a smarter search algorithm or an efficient pruning strategy.

Prerequisites:

  • programming experience in Python (+ sklearn)
  • interest in data integration
  • experience in machine learning & database technologies 

References:

[1] Horn, F., Pack, R. and Rieger, M., 2019, September. The autofeat Python library for automated feature engineering and selection. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 111-120.

[2] Diaz, R., Neutatz, F. and Abedjan, Z., Automated Feature Engineering for Algorithmic Fairness. In Proceedings of the VLDB Endowment 14 (9), 1694-1702.

[3] Neutatz, F., Biessmann, F. and Abedjan, Z., 2021. Enforcing Constraints for Machine Learning Systems via Declarative Feature Selection: An Experimental Study. Proceedings of the 2021 International Conference on Management of Data. 

For a detailed introduction to the topic, please get in contact via email with Felix Neutatz.

Advisor and Contact:

Felix Neutatz < neutatz@dbs.uni-hannover.de > (LUH) 

Prof. Dr. Ziawasch Abedjan < abedjan@dbs.uni-hannover.de > (LUH)