PT - JOURNAL ARTICLE AU - Daniel Backenroth AU - Zihuai He AU - Krzysztof Kiryluk AU - Valentina Boeva AU - Lynn Pethukova AU - Ekta Khurana AU - Angela Christiano AU - Joseph D. Buxbaum AU - Iuliana Ionita-Laza TI - FUN-LDA: A LATENT DIRICHLET ALLOCATION MODEL FOR PREDICTING TISSUE-SPECIFIC FUNCTIONAL EFFECTS OF NONCODING VARIATION AID - 10.1101/069229 DP - 2017 Jan 01 TA - bioRxiv PG - 069229 4099 - http://biorxiv.org/content/early/2017/08/02/069229.short 4100 - http://biorxiv.org/content/early/2017/08/02/069229.full AB - We describe here a new method based on a latent Dirichlet allocation model for predicting functional effects of noncoding genetic variants in a cell type and tissue specific way (FUN-LDA) by integrating diverse epigenetic annotations for specific cell types and tissues from large scale epige-nomics projects such as ENCODE and Roadmap Epigenomics. Using this unsupervised approach we predict tissue-specific functional effects for every position in the human genome. We demonstrate the usefulness of our predictions using several validation experiments. Using eQTL data from several sources, including the Genotype-Tissue Expression project, the Geuvadis project and Twin-sUK cohort, we show that eQTLs in specific tissues tend to be most enriched among the predicted functional variants in relevant tissues in Roadmap. We further show how these integrated functional scores can be used to derive the most likely cell/tissue type causally implicated for a complex trait using summary statistics from genome-wide association studies, and estimate a tissue-based correlation matrix of various complex traits. We find large enrichment of heritability in functional components of relevant tissues for various complex traits, with FUN-LDA yielding the highest enrichment estimates relative to existing methods. Finally, using experimentally validated functional variants from the literature and variants possibly implicated in disease by previous studies, we rigorously compare FUN-LDA to state-of-the-art functional annotation methods such as GenoSky-line, ChromHMM, Segway, and IDEAS, and show that FUN-LDA has better prediction accuracy and higher resolution compared to these methods. In summary, we describe a new approach and perform rigorous comparisons with the most commonly used functional annotation methods, providing a valuable resource for the community interested in the functional annotation of noncoding variants. Scores for each position in the human genome and for each ENCODE/Roadmap tissue are available from http://www.columbia.edu/~ii2135/funlda.html.