Abstract
Genomics for rare disease diagnosis has advanced at a rapid pace due to our ability to perform “N-of-1” analyses on individual patients with ultra-rare diseases. The increasing sizes of ultra-rare disease cohorts internationally newly enables cohort-wide analyses for new discoveries, but well-calibrated statistical genetics approaches for jointly analyzing these patients are still under development.1,2 The Undiagnosed Diseases Network (UDN) brings multiple clinical, research and experimental centers under the same umbrella across the United States to facilitate and scale N-of-1 analyses. Here, we present the first joint analysis of whole genome sequencing data of UDN patients across the network. We introduce new, well-calibrated statistical methods for prioritizing disease genes with de novo recurrence and compound heterozygosity. We also detect pathways enriched with candidate and known diagnostic genes. Our computational analysis, coupled with a systematic clinical review, recapitulated known diagnoses and revealed new disease associations. We further release a software package, RaMeDiES, enabling automated cross-analysis of deidentified sequenced cohorts for new diagnostic and research discoveries. Gene-level findings and variant-level information across the cohort are available in a public-facing browser (https://dbmi-bgm.github.io/udn-browser/). These results show that N-of-1 efforts should be supplemented by a joint genomic analysis across cohorts.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
↵† Email: shamil_sunyaev{at}hms.harvard.edu
We refocused the manuscript around the methodology for easy analysis across cohorts, and have addressed critique by improving the methods, by applying much more stringent p-value and FDR cutoffs, and by restructuring the text. We specifically note that the rare disease field has accumulated many individually small (but collectively large) cohorts that have never been included in any joint analyses. Our publicly released code provides a way to analyze any cohort vis-a-vis our data. Unlike popular matchmaking analyses (e.g., Matchmaker Exchange), we do not require any prior hypothesis on a specific gene. Our approach relies on closed form statistical solutions that operate on summary statistics and do not require sharing of variant-level genomic patient data. We offer three methods: (1) A new method to detect significant recurrence of de novo mutations that now combines multiple variant prediction scores including the most recent AI/ML approaches. The standalone software can be easily run across cohorts. Application of this method to the UDN cohort recovered 11 known diagnoses and identified five new disease associations; (2) The first well-calibrated method for the analysis of recessive compound heterozygous cases. We have not detected any recurrent recessive genes in our cohort but highlight clinically validated findings based on single high-scoring compound heterozygous patients; (3) The first pathway-level statistical approach for ultrarare diseases that yields intriguing candidates.