RT Journal Article SR Electronic T1 Advances in computer-assisted syndrome recognition and differentiation in a set of metabolic disorders JF bioRxiv FD Cold Spring Harbor Laboratory SP 219394 DO 10.1101/219394 A1 Jean Tori Pantel A1 Max Zhao A1 Martin Atta Mensah A1 Nurulhuda Hajjir A1 Tzung-Chien Hsieh A1 Yair Hanani A1 Nicole Fleischer A1 Tom Kamphans A1 Stefan Mundlos A1 Yaron Gurovich A1 Peter M. Krawitz YR 2017 UL http://biorxiv.org/content/early/2017/11/14/219394.abstract AB Significant improvements in automated image analysis have been achieved over the recent years and tools are now increasingly being used in computer-assisted syndromology. However, the recognizability of the facial gestalt might depend on the syndrome and may also be confounded by severity of phenotype, size of available training sets, ethnicity, age, and sex. Therefore, benchmarking and comparing the performance of deep-learned classification processes is inherently difficult.For a systematic analysis of these influencing factors we chose the lysosomal storage diseases Mucolipidosis as well as Mucopolysaccharidosis type I and II, that are known for their wide and overlapping phenotypic spectra. For a dysmorphic comparison we used Smith-Lemli-Opitz syndrome as a metabolic disease and Nicolaides-Baraitser syndrome as another disorder that is also characterized by coarse facies. A classifier that was trained on these five cohorts, comprising 288 patients in total, achieved a mean accuracy of 62%.The performance of automated image analysis is not only significantly higher than randomly expected but also better than in previous approaches. In part this might be explained by our large training sets. We therefore set up a simulation pipeline that is suited to analyze the effect of different potential confounders, such as cohort size, age, sex, or ethnic background on the recognizability of phenotypes. We found that the true positive rate increases for all analyzed disorders for growing cohorts (n=[10…40]) while ethnicity and sex have no significant influence.The dynamics of the accuracies strongly suggest that the maximum recognizability is a phenotype-specific value, that hasn’t been reached yet for any of the studied disorders. This should also be a motivation to further intensify data sharing efforts, as computer-assisted syndrome classification can still be improved by enlarging the available training sets.Availability: software for classification: https://app.face2gene.com/research,AbbreviationsDDxDifferential DiagnosesDPDLDeep Phenotyping for Deep LearningDSDown syndromeFDNAFacial Dysmorphology Novel AnalysisFNRFalse Negative RateFPRFalse Positive RateGAGGlycosaminoglycanHPOHuman Phenotype OntologyLSDLysosomal Storage DiseaseMLMucolipidosisMPS IMucopolysaccharidosis type IMPS IIMucopolysaccharidosis type IINCBRSNicolaides-Baraitser SyndromeROCReceiver Operating CharacteristicsSLOSSmith-Lemli-Opitz SyndromeTPRTrue Positive RateAbbreviationsDDxDifferential DiagnosesDPDLDeep Phenotyping for Deep LearningDSDown syndromeFDNAFacial Dysmorphology Novel AnalysisFNRFalse Negative RateFPRFalse Positive RateGAGGlycosaminoglycanHPOHuman Phenotype OntologyLSDLysosomal Storage DiseaseMLMucolipidosisMPS IMucopolysaccharidosis type IMPS IIMucopolysaccharidosis type IINCBRSNicolaides-Baraitser SyndromeROCReceiver Operating CharacteristicsSLOSSmith-Lemli-Opitz SyndromeTPRTrue Positive Rate