TY - JOUR T1 - Classification of non-coding variants with high pathogenic impact JF - bioRxiv DO - 10.1101/2021.05.03.442347 SP - 2021.05.03.442347 AU - Lambert Moyon AU - Camille Berthelot AU - Alexandra Louis AU - Nga Thi Thuy Nguyen AU - Hugues Roest Crollius Y1 - 2021/01/01 UR - http://biorxiv.org/content/early/2021/05/03/2021.05.03.442347.abstract N2 - Whole genome sequencing is increasingly used to diagnose medical conditions of genetic origin. While both coding and non-coding DNA variants contribute to a wide range of diseases, most patients who receive a WGS-based diagnosis today harbour a protein-coding mutation. Functional interpretation and prioritization of non-coding variants represents a persistent challenge, and disease-causing non-coding variants remain largely unidentified. Depending on the disease, WGS fails to identify a candidate variant in 20-80% of patients, severely limiting the usefulness of sequencing for personalised medicine. Here we present FINSURF, a machine-learning approach to predict the functional impact of non-coding variants in regulatory regions. FINSURF outperforms state-of-the-art methods, owing to control optimisation during training. In addition to ranking candidate variants, FINSURF also delivers diagnostic information on functional consequences of mutations. We applied FINSURF to a diverse set of 30 diseases with described causative non-coding mutations, and correctly identified the disease-causative non-coding variant within the ten top hits in 22 cases. FINSURF is implemented as an online server to as well as custom browser tracks, and provides a quick and efficient solution to prioritize candidate non-coding variants in realistic clinical settings.Competing Interest StatementThe authors have declared no competing interest. ER -