RT Journal Article SR Electronic T1 Widespread population variability of intron size in evolutionary old genes: implications for gene expression variability JF bioRxiv FD Cold Spring Harbor Laboratory SP 171165 DO 10.1101/171165 A1 Maria Rigau A1 David Juan A1 Alfonso Valencia A1 Daniel Rico YR 2017 UL http://biorxiv.org/content/early/2017/08/01/171165.abstract AB Introns were originally thought to be “junk DNA” without function but accumulating evidence has shown that they can have important functions in the regulation of gene expression. In humans and other mammals, introns can be extraordinarily large and together they account for the majority of the sequence in human protein-coding loci. However, little is known about their structural variation in human populations and the potential functional impact of this genomic variation. To address this, we have studied how copy number variants (CNVs) differentially affect exonic and intronic sequences of protein-coding genes. Using five different CNV maps, we found that CNV gains and losses are consistently underrepresented in coding regions. However, we found purely intronic losses in protein-coding genes more frequently than expected by chance, even in essential genes. Following a phylogenetic approach, we dissected how CNV losses differentially affect genes depending on their evolutionary age. Evolutionarily young genes frequently overlap with deletions that partially or entirely eliminate their coding sequence, while in evolutionary ancient genes the losses of intronic DNA are the most frequent CNV type. A detailed characterisation of these events showed that the loss of intronic sequence can be associated with significant differences in gene length and expression levels in the population. In summary, we show that genomic variation is shaping gene evolution in different ways depending on the age and function of genes. CNVs affecting introns can exert an important role in maintaining the variability of gene expression in human populations, a variability that could be related with human adaptation.Author summary Most human genes have introns that have to be removed after a gene is transcribed from DNA to RNA because they not encode information to translate RNA into proteins. As mutations in introns do not affect protein sequences, they are usually ignored when looking for normal or pathogenic genomic variation. However, introns comprise about half of the human genome and they can have important regulatory roles. We show that deletions of intronic regions appear more frequent than previously expected in the healthy population, with a significant proportion of genes with evolutionary ancient and essential functions carrying them. This finding was very surprising, as ancient genes tend to have high conservation of their coding sequence. However, we show that deletions of their non-coding intronic sequence can produce considerable changes in their length, significant drops of GC content that could affect splicing or occur in introns harboring regulatory elements. Finally, we found that a significant number of these intronic deletions are associated with under- or over-expression of the affected genes, showing that intronic deletions can be responsible for gene expression variability in ancient genes with highly conserved protein sequences. Our data suggests that the frequent gene length variation in ancient genes resulting from intronic CNVs might have an important role in the fine-tuning of their regulation in different individuals.