In this talk I will present for the first time a new analysis that compares the growing publically available variation data for human with variation seen across all available protein sequences regardless of species. The analysis confirms expected patterns of variation in human are consistent with protein structural features, but also highlights structurally and functionally important sites in around 15,000 human protein domains that are not found by conventional sequence analysis methods and would be hard to identify even by inspection of the protein three-dimensional structure. Some of the identified sites are well characterised and known to interact with ligands. The importance of other identified sites has previously been unrecognised, but our analysis suggests they are likely to have key roles in binding or domain stability.
In the talk I will explain the method and illustrate the new analysis with a number of examples including the Nuclear Receptor Ligand Binding Domains and G-protein coupled receptors (GPCRs) which are important therapeutic targets.
The new method shows promise in helping to guide the interpretation of nsSNPs in context with disease studies as well as contributing to the deeper understanding of protein function and specificity.