RT Journal Article
SR Electronic
T1 A statistical framework for mapping risk genes from <em>de novo</em> mutations in whole-genome sequencing studies
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 077578
DO 10.1101/077578
A1 Yuwen Liu
A1 A. Ercument Cicek
A1 Yanyu Liang
A1 Jinchen Li
A1 Rebecca Muhle
A1 Martina Krenzer
A1 Yue Mei
A1 Yan Wang
A1 Nicholas Knoblauch
A1 Jean Morrison
A1 Yi Jiang
A1 Evan Geller
A1 Zhongshan Li
A1 Iuliana Ionita-Laza
A1 Jinyu Wu
A1 Kun Xia
A1 James Noonan
A1 Zhong Sheng Sun
A1 Xin He
YR 2017
UL http://biorxiv.org/content/early/2017/09/13/077578.abstract
AB Analysis of de novo mutations (DNMs) from sequencing data of nuclear families has identified risk genes for many complex diseases, including multiple neurodevelopmental and psychiatric disorders. Most of these efforts have focused on mutations in protein-coding sequences. Evidence from genome-wide association studies (GWAS) strongly suggests that variants important to human diseases often lie in non-coding regions. Extending DNM-based approaches to non-coding sequences is, however, challenging because the functional significance of non-coding mutations is difficult to predict. We propose a new statistical framework for analyzing DNMs from whole-genome sequencing (WGS) data. This method, TADA-Annotations (TADA-A), is a major advance of the TADA method we developed earlier for DNM analysis in coding regions. TADA-A is able to incorporate many functional annotations such as conservation and enhancer marks, learn from data which annotations are informative of pathogenic mutations and combine both coding and non-coding mutations at the gene level to detect risk genes. It also supports meta-analysis of multiple DNM studies, while adjusting for study-specific technical effects. We applied TADA-A to WGS data of ~300 autism family trios across five studies, and discovered several new autism risk genes. The software is freely available for all research uses.