Abstract
De novo genes emerge from non-coding regions of genomes via succession of mutations. Among others, such mutations activate transcription and create a new open reading frame (ORF). Although the mechanisms underlying ORFs emergence are well documented, relatively little is known about the mechanisms enabling new transcription events. Yet, in many species a continuum between absent and very prominent transcription has been reported for essentially all regions of the genome.
In this study we searched for de novo transcripts by using newly assembled genomes and transcriptomes of seven inbred lines of Drosophila melanogaster, originating from six European and one African population. This setup allowed us to detect line specific de novo transcripts, and compare them to their homologous non-transcribed regions in other lines, as well as genic and intergenic control sequences. We studied the association with transposable elements and the enrichment of transcription factor motifs upstream of de novo emerged transcripts and compared them with regulatory elements.
We found that de novo transcripts overlap with TEs more often than expected by chance. The emergence of new transcripts correlates with high CpG islands and regions of TEs activity. Moreover, upstream regions of de novo transcripts are highly enriched with regulatory motifs. Such motifs abound in new transcripts overlapping with TEs, particularly DNA TEs, and are more conserved upstream de novo transcripts than upstream their non-transcribed homologs. Overall, our study demonstrates that TEs insertion is important for transcript emergence, partly by introducing new regulatory motifs from DNA TE families.
Competing Interest Statement
The authors have declared no competing interest.