Abstract
Purpose Prior studies demonstrate the significance of specific cis-regulatory variants in retinal disease, however determining the functional impact of regulatory variants remains a major challenge. In this study, we utilize a machine learning approach, trained on epigenomic data from the adult human retina, to systematically quantify the predicted impact of cis-regulatory variants.
Methods We used human retinal DNA accessibility data (ATAC-seq) to determine a set of 18.9k high-confidence putative cis-regulatory elements. 80% of these elements were used to train a machine learning model utilizing a gapped k-mer support vector machine-based approach. In silico saturation mutagenesis and variant scoring was applied to predict the functional impact of all potential single nucleotide variants within cis-regulatory elements. Impact scores were tested in a 20% hold-out dataset and compared to allele population frequency, phylogenetic conservation, transcription factor (TF) binding motifs, and existing massively parallel reporter assay (MPRA) data.
Results We generated a model that distinguishes between human retinal regulatory elements and negative test sequences with 95% accuracy. Among a hold-out test set of 3.7k human retinal CREs, all possible single nucleotide variants (SNVs) were scored. Variants with negative impact scores correlated with reduced population allele frequency, higher phylogenetic conservation of the reference allele, disruption of predicted TF binding motifs, and massively-parallel reporter expression.
Conclusions We demonstrated the utility of human retinal epigenomic data to train a machine learning model for the purpose of predicting the impact of non-coding regulatory sequence variants. Our model accurately scored sequences and predicted putative transcription factor binding motifs. This approach has the potential to expedite the characterization of pathogenic non-coding sequence variants in the context of unexplained retinal disease.
Competing Interest Statement
Dr. Cherry reports grants and non-financial support from Moderna and consulting work with Vedere Bio, both unrelated to this work. Dr. Lee reports grants from Santen, personal fees from Genentech, personal fees from US FDA, personal fees from Johnson and Johnson, grants from Carl Zeiss Meditec, personal fees from Topcon, personal fees from Gyroscope, non-financial support from Microsoft, grants from Regeneron, outside the submitted work; This article does not reflect the views of the US FDA.