RT Journal Article SR Electronic T1 Rfpred: A Random Forest Approach for Prediction of Missense Variants in Human Exome JF bioRxiv FD Cold Spring Harbor Laboratory SP 037127 DO 10.1101/037127 A1 Jabot-Hanin, Fabienne A1 Varet, Hugo A1 Tores, Frederic A1 Alcaïs, Alexandre A1 Jaïs, Jean-Philippe YR 2016 UL http://biorxiv.org/content/early/2016/01/19/037127.abstract AB Exome sequencing is becoming a standard tool for gene mapping of genetic diseases. Given the vast amount of data generated by Next Generation Sequencing techniques, identification of disease causal variants is like finding a needle in a haystack. The impact assessment and the prioritization of potential pathogenic variants are expected to reduce work in biological validation, which is long and costly.One of the possible approaches to determine the most probable deleterious variants in individual exomes is to use protein function alteration prediction. We propose in this paper to use a machine learning approach, the random forest to build a new meta-score based on five previously described scores (SIFT, Polyphen2, LRT, PhyloP and MutationTaster) and compiled in the dbNSFP database.The functional meta-score was trained on a dataset of 61 500 non-synonymous Single Nucleotide Polymorphisms (SNPs). The random forest method (rfPred) appears to be globally better than each of the classifiers separately or in combination in a logistic regression model, and better than a newly described score (CADD) on independent validation sets.RfPred scores have been pre-calculated for all the possible non-synonymous SNPs of human exome and are freely accessible at the web-server http://www.sbim.fr/rfPred/