Abstract
The spatial distribution of genetic variation within proteins is shaped by evolutionary constraint and thus can provide insights into the functional importance of protein regions and the potential pathogenicity of protein alterations. Here, we comprehensively evaluate the 3D spatial patterns of constraint on human germline and somatic variation in 4,568 solved protein structures. Different classes of coding variants have significantly different spatial distributions. Neutral missense variants exhibit a range of 3D constraint patterns, with a general trend of spatial dispersion driven by constraint on core residues. In contrast, germline and somatic disease-causing variants are significantly more likely to be clustered in protein structure space. We demonstrate that this difference in the spatial distributions of disease-associated and benign germline variants provides a signature for accurately classifying variants of unknown significance (VUS) that is complementary to current approaches for VUS classification. We further illustrate the clinical utility of our approach by classifying new mutations identified from patients with familial idiopathic pneumonia (FIP) that segregate with disease.