Integrative analytics connecting genotype and phenotype for precision oncology

Understanding the molecular mechanisms that control the biology of health and disease requires development of models that traverse multiple scales of organisation in order to encapsulate the relationships between genes and linking to observable phenotypes. Measuring, parameterising and simulating the entire system that determines these phenotypes in exhaustive detail is typically impossible due to the underlying biological complexity, our limited knowledge and the paucity of available data. For example, approximately one third of human genes are poorly characterised and most genes perform multiple functions, which manifest according to the surrounding biochemical context. Indeed, new functions continue to emerge even for deeply studied genes. Therefore, simplifying abstractions in concert with empirical analysis of matched genome-scale and descriptive data are valuable strategies to fill knowledge gaps relevant to a focused biomedical question or hypothesis.

Epithelial plasticity is a key driver of cancer progression and is associated with the most life-threatening phenotypes; specifically, metastasis and drug resistance. Computational methods developed in my group enable modelling the molecular control of important cancer phenotypes. We applied a machine learning approach for genome-wide context-specific biochemical interaction network inference (CoSNI) to map gene function for the Epithelial to Mesenchymal Transition cell programme (EMT_MAP), predicting new mechanisms in control of cancer invasion. Analysis of patient data with EMT_MAP and our NetNC algorithm [Cancers 2020;12:2823; github.com/overton-group/NetNC] enabled discovery of candidate renal cancer prognostic markers with clear advantages over standard statistical approaches. NetNC recovers the network-defined signal in noisy data, for example distinguishing functional EMT Transcription Factor targets from ‘neutral’ binding sites and defining biologically coherent modules in renal cancer drug response time course data. These and other approaches, including SynLeGG (Nucleic Acids Research 2021;49:W613-8, www.overton-lab.uk/synlegg) and an information-theoretic approach to causality (GABI) offer mechanistic insights and opportunity to predict candidate cancer Achilles’ heels for drug discovery. Computational results were validated in follow-up experiments, towards new clinical tools for precision oncology.