Explainable Machine Learning for Precision Medicine of Patients with Infectious Diseases


This thesis aims to present Machine Learning applications for the development of precision medicine in patients with infectious diseases. This is outlined by proposing computational solutions to two major challenges in precision medicine: how to infer relevant host genetic factors in heterogeneous populations (Study I) and how to predict patient-specific risk while accounting for censored individuals (Study II). For both challenges, the implemented models are explained based on domain knowledge of biological systems and disease aetiology supported by methods of model interpretability. This corresponds to the secondary aim of developing not only predictive models but also deepening the understanding of HIV host genomics and SARS-CoV-2 risk factors respectively. The specific objectives of each study were: Study I – Associations of functional HLA class I groups with HIV viral load in a heterogeneous cohort. To assess if functional clustering of the main host genetic factors involved in HIV control, Human Leukocyte Antigen alleles, based on predicted binding affinities to HIV peptides facilitate the study of HLA alleles in demographically heterogeneous cohorts. Study II – Personalized survival probabilities for SARS-CoV-2 positive patients by explainable machine learning. To implement survival machine learning models for predicting personalized 12-week mortality of SARS-CoV-2 positive patients by leveraging electronic health records and describing temporal dynamics of relevant risk factors through model explainability.

University of Copenhagen
Adrian G. Zucco
Adrian G. Zucco
Postdoc in Complexity and Big Data