RT Journal Article SR Electronic T1 A cross-organism framework for supervised enhancer prediction with epigenetic pattern recognition and targeted validation JF bioRxiv FD Cold Spring Harbor Laboratory SP 385237 DO 10.1101/385237 A1 Anurag Sethi A1 Mengting Gu A1 Emrah Gumusgoz A1 Landon Chan A1 Koon-Kiu Yan A1 Joel Rozowsky A1 Iros Barozzi A1 Veena Afzal A1 Jennifer Akiyama A1 Ingrid Plajzer-Frick A1 Chengfei Yan A1 Catherine Pickle A1 Momoe Kato A1 Tyler Garvin A1 Quan Pham A1 Anne Harrington A1 Brandon Mannion A1 Elizabeth Lee A1 Yoko Fukuda-Yuzawa A1 Axel Visel A1 Diane E. Dickel A1 Kevin Yip A1 Richard Sutton A1 Len A. Pennacchio A1 Mark Gerstein YR 2018 UL http://biorxiv.org/content/early/2018/08/05/385237.abstract AB Enhancers are important noncoding elements, but they have been traditionally hard to characterize experimentally. Only a few mammalian enhancers have been validated, making it difficult to train statistical models for their identification properly. Instead, postulated patterns of genomic features have been used heuristically for identification. The development of massively parallel assays allows for the characterization of large numbers of enhancers for the first time. Here, we developed a framework that uses Drosophila STARR-seq data to create shape-matching filters based on enhancer-associated meta-profiles of epigenetic features. We combined these features with supervised machine learning algorithms (e.g., support vector machines) to predict enhancers. We demonstrated that our model could be applied to predict enhancers in mammalian species (i.e., mouse and human). We comprehensively validated the predictions using a combination of in vivo and in vitro approaches, involving transgenic assays in mouse and transduction-based reporter assays in human cell lines. Overall, the validations involved 153 enhancers in 6 mouse tissues and 4 human cell lines. The results confirmed that our model can accurately predict enhancers in different species without re-parameterization. Finally, we examined the transcription-factor binding patterns at predicted enhancers and promoters in human cell lines. We demonstrated that these patterns enable the construction of a secondary model effectively discriminating between enhancers and promoters.