Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

The 3D spatial constraint on 6.1 million amino acid sites in the human proteome

View ORCID ProfileBian Li, View ORCID ProfileDan M. Roden, View ORCID ProfileJohn A. Capra
doi: https://doi.org/10.1101/2021.09.15.460390
Bian Li
1Department of Biological Sciences, Vanderbilt University, Nashville, TN 37203, USA
2Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Bian Li
Dan M. Roden
2Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
3Departments of Pharmacology and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Dan M. Roden
John A. Capra
1Department of Biological Sciences, Vanderbilt University, Nashville, TN 37203, USA
4Bakar Computational Health Sciences Institute and Department of Epidemiology and Biostatistics, University of California, San Francisco, CA 94143, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for John A. Capra
  • For correspondence: tony@capralab.org
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Quantification of the tolerance of protein-coding sites to genetic variation within human populations has become a cornerstone of the prediction of the function of genomic variants. We hypothesize that the constraint on missense variation at individual amino acid sites is largely shaped by direct 3D interactions with neighboring sites. To quantify the constraint on protein-coding genetic variation in 3D spatial neighborhoods, we introduce a new framework called COntact Set MISsense tolerance (or COSMIS) for estimating constraint. Leveraging recent advances in computational structure prediction, large-scale sequencing data from gnomAD, and a mutation-spectrum-aware statistical model, we comprehensively map the landscape of 3D spatial constraint on 6.1 amino acid sites covering >80% (16,533) of human proteins. We show that the human proteome is broadly under 3D spatial constraint and that the level of spatial constraint is strongly associated with disease relevance both at the individual site level and the protein level. We demonstrate that COSMIS performs significantly better at a range of variant interpretation tasks than other population-based constraint metrics while also providing biophysical insight into the potential functional roles of constrained sites. We make our constraint maps freely available and anticipate that the structural landscape of constrained sites identified by COSMIS will facilitate interpretation of protein-coding variation in human evolution and prioritization of sites for mechanistic or functional investigation.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • https://github.com/CapraLab/cosmis

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted September 16, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
The 3D spatial constraint on 6.1 million amino acid sites in the human proteome
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
The 3D spatial constraint on 6.1 million amino acid sites in the human proteome
Bian Li, Dan M. Roden, John A. Capra
bioRxiv 2021.09.15.460390; doi: https://doi.org/10.1101/2021.09.15.460390
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
The 3D spatial constraint on 6.1 million amino acid sites in the human proteome
Bian Li, Dan M. Roden, John A. Capra
bioRxiv 2021.09.15.460390; doi: https://doi.org/10.1101/2021.09.15.460390

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4110)
  • Biochemistry (8813)
  • Bioengineering (6518)
  • Bioinformatics (23456)
  • Biophysics (11788)
  • Cancer Biology (9205)
  • Cell Biology (13318)
  • Clinical Trials (138)
  • Developmental Biology (7433)
  • Ecology (11407)
  • Epidemiology (2066)
  • Evolutionary Biology (15146)
  • Genetics (10433)
  • Genomics (14041)
  • Immunology (9169)
  • Microbiology (22152)
  • Molecular Biology (8808)
  • Neuroscience (47559)
  • Paleontology (350)
  • Pathology (1428)
  • Pharmacology and Toxicology (2491)
  • Physiology (3730)
  • Plant Biology (8079)
  • Scientific Communication and Education (1437)
  • Synthetic Biology (2220)
  • Systems Biology (6037)
  • Zoology (1252)