In this thesis, we would like to use a problem described in the Data Science Challenge at BTW conference 2021 [1]. In this challenge, the task is to develop a system to predict the performance and energy consumption of different products and workflows with high accuracy. Besides, the system should be able to explain the outlier predictions along with the regression task. Both prediction and the explanation must be fast enough that can be applied in the stream data environment.

In the end, the developed system is supposed to participate in the BTW Data Science Challenge 2021. 

Other than the accuracy of the model, there are two main characteristics of the system that distinguishes the winner: explainability and scalability of the system.

Research Questions:  

Runtime vs. accuracy has always been a tremendous trade-off in machine learning [2]. Models such as linear models are faster than more complicated models, e.g, deep neural networks (DNNs), but on the other hand,  these simpler models have lower accuracy. 

This problem escalates if the explainability is also brought into consideration [3]. Explainability refers to the power of a model in describing the fact that why the model predicted a specific label or value.

Complex models such as neural networks are very accurate but not explainable [4]. On the other hand, explaining the predictions requires extra evaluations that cause higher runtime. 

This vicious circle prevents data scientists from training accurate, explainable, and fast machine learning pipelines.  This scales if an extra constraint is also applied to the system. For instance, a limited number of training data, stream scenarios, and limited available hardware.

We would like to answer a subset of the following questions in this thesis:

  • How can we adapt the accurate ML models such as DNN to be explainable?

  • Can ensemble models increase the explainability?

  • How can we reduce the runtime overhead of the explainability feature to be applicable to the stream scenarios?

  • How to adapt a model at hand with further constraints such as the number of available data points or runtime constraints?


  • Programming experience in Python (+ sklearn)

  • Interest in data integration

  • Experience in machine learning & database technologies

  • Experience in working with AutoML systems would be a bonus point

  • Motivation to be a part of a real data science challenge

Related Work:


[2] Liu, Hongche, et al. "Accuracy vs efficiency trade-offs in optical flow algorithms." Computer vision and image understanding 72.3 (1998): 271-286.

[3] London, Alex John. "Artificial intelligence and black‐box medical decisions: accuracy versus explainability." Hastings Center Report 49.1 (2019): 15-21.

[4] Xie, Ning, et al. "Explainable deep learning: A field guide for the uninitiated." arXiv preprint arXiv:2004.14545 (2020).

For a detailed introduction to the topic, please get in contact via email.

Advisor and Contact:

Mahdi Esmailoghli <> (LUH)

Prof. Dr. Ziawasch Abedjan <> (LUH)