Fachgebiet Datenbanken und InformationssystemeAbschlussarbeiten
A unified data representation for few-shot learning [MSc]

A UNIFIED DATA REPRESENTATION FOR FEW-SHOT LEARNING [MSC]

Context:  

When developing ML applications the main hurdles for fast iterative development cycles are the long training periods for a model to optimize its hyperparameters. If one could avoid both, this would enable an ML developer to concentrate on data engineering.

Problem / Task:

The idea is that if we can represent any instance, e.g. an apple, an airplane, a credit assessment, in the same space, we could train one model to predict them all. As this is extremely difficult, we can work around this problem by extracting unified meta information from all these instances. This unified representation then allows us to train one model. For instance, one could train a logistic regression model, a k-nearest neighbor model, and an SVM on a small fraction of each dataset. Then, we could use the predictions of these models as features in the unified data representation. An example of such a unified representation could look as follows:

So given a repository of datasets, the task is develop a unified data representation. Specifically, one needs to design a feature transformation that transformers any instance of any dataset into one unified representation. One can resort to the broad range of metadata features that were proposed in literature [1]. For instance, one can train landmarking classifiers on subsets and leverage their predictions as features or leverage statistics about the data, such as the dataset shape or the class distribution. So, the main task is to design a metadata feature vector that can be quickly computed but contains as much information as possible to reach high classification accuracy.

Finally, one should evaluate the system against the state-of-the-art AutoML system AutoSklearn2 [2] to compare both the “time to prediction” and the classification accuracy.

Prerequisites:

  • programming experience in Python (+ sklearn)

  • interest in data integration

  • experience in machine learning & database technologies

 

Related Work:

[1] Vanschoren, J., 2018. Meta-learning: A survey. arXiv preprint arXiv:1810.03548.

[2] Feurer, M., Eggensperger, K., Falkner, S., Lindauer, M. and Hutter, F., 2020. Auto-sklearn 2.0: Hands-free automl via meta-learning. arXiv preprint arXiv:2007.04074.

Advisor and Contact:

Felix Neutatz <neutatz@dbs.uni-hannover.de> 

Prof. Dr. Ziawasch Abedjan <abedjan@dbs.uni-hannover.de> (Leibniz Universität Hannover)