PT - JOURNAL ARTICLE AU - Dennis Lal AU - Patrick May AU - Kaitlin E. Samocha AU - Jack A. Kosmicki AU - Elise B. Robinson AU - Rikke S. Møller AU - Roland Krause AU - Peter Nüernberg AU - Sarah Weckhuysen AU - Peter De Jonghe AU - Renzo Guerrini AU - Lisa M. Neupert AU - Juliana Du AU - Eduardo Perez-Palma AU - Carla Marini AU - EuroEpinomics-RES Consortium AU - James S. Ware AU - Mitja Kurki AU - Padhraig Gormley AU - Sha Tang AU - Sitao Wu AU - Saskia Biskup AU - Annapura Poduri AU - Bernd A. Neubauer AU - Bobby P. Koeleman AU - Katherine L. Helbig AU - Yvonne G. Weber AU - Ingo Helbig AU - Amit R. Majithia AU - Aarno Palotie AU - Mark J. Daly TI - Gene family information facilitates variant interpretation and identification of disease-associated genes AID - 10.1101/159780 DP - 2017 Jan 01 TA - bioRxiv PG - 159780 4099 - http://biorxiv.org/content/early/2017/07/05/159780.short 4100 - http://biorxiv.org/content/early/2017/07/05/159780.full AB - Differentiating risk-conferring from benign missense variants, and therefore optimal calculation of gene-variant burden, represent a major challenge in particular for rare and genetic heterogeneous disorders. While orthologous gene conservation is commonly employed in variant annotation, approximately 80% of known disease-associated genes are paralogs and belong to gene families. It has not been thoroughly investigated how gene family information can be utilized for disease gene discovery and variant interpretation. We developed a paralog conservation score to empirically evaluate whether paralog conserved or nonconserved sites of in-human paralogs are important for protein function. Using this score, we demonstrate that disease-associated missense variants are significantly enriched at paralog conserved sites across all disease groups and disease inheritance models tested. Next, we assessed whether gene family information could assist in discovering novel disease-associated genes. We subsequently developed a gene family de novo enrichment framework that identified 43 exome-wide enriched gene families including 98 de novo variant carrying genes in more than 10k neurodevelopmental disorder patients. 33 gene family enriched genes represent novel candidate genes which are brain expressed and variant constrained in neurodevelopmental disorders.