Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Human Missense Variation is Constrained by Domain Structure and Highlights Functional and Pathogenic Residues

View ORCID ProfileStuart A. MacGowan, View ORCID ProfileFábio Madeira, View ORCID ProfileThiago Britto-Borges, View ORCID ProfileMelanie S. Schmittner, View ORCID ProfileChristian Cole, View ORCID ProfileGeoffrey J. Barton
doi: https://doi.org/10.1101/127050
Stuart A. MacGowan
1Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK.
2Centre for Dermatology and Genetic Medicine, School of Life Sciences, University of Dundee, Dundee, U.K.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Stuart A. MacGowan
Fábio Madeira
1Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Fábio Madeira
Thiago Britto-Borges
1Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Thiago Britto-Borges
Melanie S. Schmittner
1Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Melanie S. Schmittner
Christian Cole
1Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Christian Cole
Geoffrey J. Barton
1Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK.
2Centre for Dermatology and Genetic Medicine, School of Life Sciences, University of Dundee, Dundee, U.K.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Geoffrey J. Barton
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Human genome sequencing has generated population variant datasets containing millions of variants from hundreds of thousands of individuals1-3. The datasets show the genomic distribution of genetic variation to be influenced on genic and sub-genic scales by gene essentiality,1,4,5 protein domain architecture6 and the presence of genomic features such as splice donor/acceptor sites.2 However, the variant data are still too sparse to provide a comparative picture of genetic variation between individual protein residues in the proteome.1,6 Here, we overcome this sparsity for ∼25,000 human protein domains in 1,291 domain families by aggregating variants over equivalent positions (columns) in multiple sequence alignments of sequence-similar (paralagous) domains7,8. We then compare the resulting variation profiles from the human population to residue conservation across all species9 and find that the same tertiary structural and functional pressures that affect amino acid conservation during domain evolution constrain missense variant distributions. Thus, depletion of missense variants at a position implies that it is structurally or functionally important. We find such positions are enriched in known disease-associated variants (OR = 2.83, p ≈ 0) while positions that are both missense depleted and evolutionary conserved are further enriched in disease-associated variants (OR = 1.85, p = 3.3×10-17) compared to those that are only evolutionary conserved (OR = 1.29, p = 4.5×10-19). Unexpectedly, a subset of evolutionary Unconserved positions are Missense Depleted in human (UMD positions) and these are also enriched in pathogenic variants (OR = 1.74, p = 0.02). UMD positions are further differentiated from other unconserved residues in that they are enriched in ligand, DNA and protein binding interactions (OR = 1.59, p = 0.003), which suggests this stratification can identify functionally important positions. A different class of positions that are Conserved and Missense Enriched (CME) show an enrichment of ClinVar risk factor variants (OR = 2.27, p = 0.004). We illustrate these principles with the G-Protein Coupled Receptor (GPCR) family, Nuclear Receptor Ligand Binding Domain family and In Between Ring-Finger (IBR) domains and list a total of 343 UMD positions in 211 domain families. This study will have broad applications to: (a) providing focus for functional studies of specific proteins by mutagenesis; (b) refining pathogenicity prediction models; (c) highlighting which residue interactions to target when refining the specificity of small-molecule drugs.

Footnotes

  • The authors declare no competing financial interests.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted April 13, 2017.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Human Missense Variation is Constrained by Domain Structure and Highlights Functional and Pathogenic Residues
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Human Missense Variation is Constrained by Domain Structure and Highlights Functional and Pathogenic Residues
Stuart A. MacGowan, Fábio Madeira, Thiago Britto-Borges, Melanie S. Schmittner, Christian Cole, Geoffrey J. Barton
bioRxiv 127050; doi: https://doi.org/10.1101/127050
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Human Missense Variation is Constrained by Domain Structure and Highlights Functional and Pathogenic Residues
Stuart A. MacGowan, Fábio Madeira, Thiago Britto-Borges, Melanie S. Schmittner, Christian Cole, Geoffrey J. Barton
bioRxiv 127050; doi: https://doi.org/10.1101/127050

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4384)
  • Biochemistry (9602)
  • Bioengineering (7100)
  • Bioinformatics (24885)
  • Biophysics (12625)
  • Cancer Biology (9968)
  • Cell Biology (14364)
  • Clinical Trials (138)
  • Developmental Biology (7966)
  • Ecology (12115)
  • Epidemiology (2067)
  • Evolutionary Biology (15997)
  • Genetics (10932)
  • Genomics (14746)
  • Immunology (9875)
  • Microbiology (23683)
  • Molecular Biology (9486)
  • Neuroscience (50907)
  • Paleontology (370)
  • Pathology (1540)
  • Pharmacology and Toxicology (2684)
  • Physiology (4022)
  • Plant Biology (8664)
  • Scientific Communication and Education (1510)
  • Synthetic Biology (2397)
  • Systems Biology (6442)
  • Zoology (1346)