PT - JOURNAL ARTICLE AU - Fan Lin AU - Jue Fan AU - Seung Y. Rhee TI - QTG-Finder: a machine-learning algorithm to prioritize causal genes of quantitative trait loci in Arabidopsis and rice AID - 10.1101/484204 DP - 2018 Jan 01 TA - bioRxiv PG - 484204 4099 - http://biorxiv.org/content/early/2018/11/30/484204.short 4100 - http://biorxiv.org/content/early/2018/11/30/484204.full AB - Linkage mapping is one of the most commonly used methods to identify genetic loci that determine a trait. However, the loci identified by linkage mapping may contain hundreds of candidate genes and require a time-consuming and labor-intensive fine mapping process to find the causal gene controlling the trait. With the availability of a rich assortment of genomic and functional genomic data, it is possible to develop a computational method to facilitate faster identification of causal genes. We developed QTG-Finder, a machine-learning algorithm to prioritize causal genes by ranking genes within a quantitative trait locus (QTL). Two predictive models were trained separately based on known causal genes in Arabidopsis and rice. With an independent validation analysis, we demonstrate the models can correctly prioritize about 80% and 55% of Arabidopsis and rice causal genes when the top 20% ranked genes were considered. The models can prioritize different types of traits though at different efficiency. We also identified several important features of causal genes including non-synonymous SNPs at conserved protein sequences, paralog copy number, and being a transporter. This work lays the foundation for systematically understanding characteristics of causal genes and establishes a pipeline to predict causal genes based on public data.