PT - JOURNAL ARTICLE AU - Paolo Devanna AU - Xiaowei Sylvia Chen AU - Joses Ho AU - Dario Gajewski AU - Alessandro Gialluisi AU - Clyde Francks AU - Simon E. Fisher AU - Dianne Newbury AU - Sonja C. Vernes TI - Finding functional disease-associated non-coding variation using next-generation sequencing AID - 10.1101/060285 DP - 2016 Jan 01 TA - bioRxiv PG - 060285 4099 - http://biorxiv.org/content/early/2016/06/22/060285.short 4100 - http://biorxiv.org/content/early/2016/06/22/060285.full AB - Next generation sequencing has opened the way for the large scale interrogation of cohorts at the whole exome, or whole genome level. Currently, the field largely focuses on potential disease causing variants that fall within coding sequences and that are predicted to cause protein sequence changes, generally discarding non-coding variants. However non-coding DNA makes up ~98% of the genome and contains a range of sequences essential for controlling the expression of protein coding genes. Thus, potentially causative non-coding variation is currently being overlooked. To address this, we have designed an approach to assess variation in one class of non-coding regulatory DNA; the 3′UTRome. Variants in the 3'UTR region of genes are of particular interest because 3'UTRs are responsible for modulating protein expression levels via their interactions with microRNAs. Furthermore they are amenable to large scale analysis as 3′UTR-microRNA interactions are based on complementary base pairing and as such can be predicted in silico at the genome-wide level. We report a strategy for identifying and functionally testing variants in microRNA binding sites within the 3'UTRome and demonstrate the efficacy of this pipeline in a cohort of language impaired children. Using whole exome sequence data from 43 probands, we extracted variants that lay within 3'UTR microRNA binding sites. We identified a common variant (SNP) in a microRNA binding site and found this SNP to be associated with an endophenotype of language impairment (non-word repetition). We showed that this variant disrupted microRNA regulation in cells and was linked to altered gene expression in the brain, suggesting it may represent a risk factor contributing to SLI. This work demonstrates that biologically relevant variants are currently being under-investigated despite the wealth of next-generation sequencing data available and presents a simple strategy for interrogating non-coding regions of the genome. We propose that this strategy should be routinely applied to whole exome and whole genome sequence data in order to broaden our understanding of how non-coding genetic variation underlies complex phenotypes such as neurodevelopmental disorders.