Fachgebiet Datenbanken und InformationssystemeAbschlussarbeiten
Meta-learning fair ML pipeline components [MSc]

META-LEARNING FAIR ML PIPELINE COMPONENTS [MSC]

Context:  

Nowadays, machine learning (ML) prediction accuracy is not the only quality dimension for ML applications anymore. For instance, fairness is a critical dimension for modern ML. E.g., for a hiring application, the model predictions should be fair across groups, such as genders, religions, and other sensitive attributes. For example, one can measure fairness by comparing the classification accuracy across groups. Perrone et al. [1] propose to optimize the hyperparameters of an ML pipeline, e.g., feature preprocessing and ML model, to achieve high ML prediction accuracy while satisfying a fairness constraint. Each part of the pipeline can have an impact on the final model prediction and therefore its fairness.

Perrone et al. [1] consider the problem as a constrained Bayesian optimization problem. However, the problem of state-of-the-art constrained Bayesian optimization strategies is that they do not scale to a large number of hyperparameters. State-of-the-art AutoML systems have large hyperparameters spaces of 100s of parameters [2]. 

Problem / Task:

The goal of this thesis is to research whether we can leverage meta-learning to learn offline whether certain ML pipeline components, e.g. feature preprocessors, such as dimensionality reduction, nonlinear transformations, or the model itself, are more likely to yield fair ML predictions for a particular ML task.

The main challenge in building the meta-learning approach is to design the metadata features that also cover fairness-related information.

To approach this problem, one can access the large repository of datasets from OpenML. To measure the goodness of the system, one can compare the results of the meta-learned system with the system proposed by Perrone et al [1]. 

Prerequisites:

  • programming experience in Python (+ sklearn)

  • interest in data integration

  • experience in machine learning & database technologies

 

Related Work:

[1] Perrone, V., Donini, M., Zafar, M.B., Schmucker, R., Kenthapadi, K. and Archambeau, C., 2021, July. Fair bayesian optimization. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (pp. 854-863).

[2] Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M. and Hutter, F., 2015. Efficient and robust automated machine learning. Advances in neural information processing systems, 28.For a detailed introduction to the topic, please get in contact via email with Felix Neutatz.

Advisor and Contact:

Felix Neutatz <neutatz@dbs.uni-hannover.de> 

Prof. Dr. Ziawasch Abedjan <abedjan@dbs.uni-hannover.de> (Leibniz Universität Hannover)