The trajectories of complex disease

The analysis of longitudinal data from electronic health records (EHRs) has the potential to improve clinical diagnosis and enable personalised medicine, motivating efforts to identify disease commonalities and subtypes from patient comorbidity information and other modalities. We have developed an age-dependent topic-modelling (ATM) method that provides a low-rank representation of longitudinal records of hundreds of distinct diseases in large EHR datasets and applied it to c. 300,000 individuals from UK Biobank and >200,000 individuals from the All of Us program. A surprisingly small number of disease trajectories capture known and novel combinations of disorders that occur throughout life and identify disease subtypes that occur in multiple topics, with differential genetic risk profiles. Such stratification improves understanding of patient risk and heterogeneity, leading to better identification of genetic risk, characterisation of pathological pathways and the discovery of new therapeutic targets.