The trajectories of complex disease

Genomic data are increasingly being used to understand infectious disease epidemiology. Isolates from a given outbreak are sequenced, and the patterns of shared variation are used to infer phylogenetic trees. However these are not directly informative about who infected whom: a phylogenetic tree is not a transmission tree. A transmission tree can be inferred from a phylogeny while accounting for within-host genetic diversity by coloring the branches of a phylogeny according to which host those branches were in. We show how this approach can be applied to partially sampled and ongoing outbreaks. This requires computing the correct probability of a partially observed transmission tree and we demonstrate how to do this for a large class of epidemiological models. The resulting uncertainty on who infected whom can be high and we explore two solutions to this problem: the use of several genomes per host, and the use of additional epidemiological data.