Development of a differential treatment selection model for depression on consolidated and transformed clinical trial datasets


This is a virtual seminar. For a Zoom link, please see "Venue". Please consider subscribing to mailing list: web.maillist.ox.ac.uk/ox/subscribe/ai4mch

Major depressive disorder (MDD) is the leading cause of disability worldwide, yet treatment selection still proceeds via “trial and error”. Given the varied presentation of MDD and heterogeneity of treatment response, the use of machine learning to understand complex, non-linear relationships in data may be key for treatment personalization. Well-organized, structured data from clinical trials with standardized outcome measures is useful for training machine learning models; however, combining data across trials poses numerous challenges. There is also persistent concern that machine learning models can propagate harmful biases. We have created a methodology for organizing and preprocessing depression clinical trial data such that transformed variables harmonized across disparate datasets can be used as input for feature selection. Using Bayesian optimization, we identified an optimal multi-layer dense neural network that used data from 21 clinical and sociodemographic features as input in order to perform differential treatment benefit prediction. With this combined dataset of 5032 individuals and 6 drugs, we created a differential treatment benefit prediction model. Our model generalized well to the held-out test set and produced similar accuracy metrics in the test and validation set with an AUC of 0.7 when predicting binary remission. To address the potential for bias propagation, we used a bias testing performance metric to evaluate the model for harmful biases related to ethnicity, age, or sex. We present a full pipeline from data preprocessing to model validation that was employed to create the first differential treatment benefit prediction model for MDD containing 6 treatment options.