Abstract
The diagnosis of Mendelian disorders requires labor-intensive literature research. Our software system AMELIE (Automatic Mendelian Literature Evaluation) greatly automates this process. AMELIE parses hundreds of thousands of full text articles to find an underlying diagnosis to explain a patient’s phenotypes given the patient’s exome. AMELIE prioritizes patient candidate genes for their likelihood of causing the patient’s phenotypes. Diagnosis of singleton patients (without relatives’ exomes) is the most time-consuming scenario. AMELIE’s gene ranking method was tested on 215 singleton Mendelian patients with a clinical diagnosis. AMELIE ranked the causal gene among the top 2 in the majority (63%) of cases. Examining AMELIE’s top 10 genes, amounting to 8% of 124 candidate genes with rare functional variants per patient, results in diagnosis for 95% of cases. Strikingly, training only on gene pathogenicity knowledge from 2011 leads to identical performance compared to training on current data. An accompanying analysis web portal has launched at AMELIE.stanford.edu.