RT Journal Article SR Electronic T1 TeXP: Deconvolving the effects of pervasive and autonomous transcription of transposable elements JF bioRxiv FD Cold Spring Harbor Laboratory SP 648667 DO 10.1101/648667 A1 Fabio CP Navarro A1 Jacob Hoops A1 Lauren Bellfy A1 Eliza Cerveira A1 Qihui Zhu A1 Chengsheng Zhang A1 Charles Lee A1 Mark B. Gerstein YR 2019 UL http://biorxiv.org/content/early/2019/05/24/648667.abstract AB Long interspersed nuclear element 1 (LINE-1) is a primary source of genetic variation in humans and other mammals. Despite its importance, LINE-1 activity remains difficult to study because of its highly repetitive nature. Here, we developed and validated a method called TeXP to gauge LINE-1 activity accurately. TeXP builds mappability signatures from LINE-1 subfamilies to deconvolve the effect of pervasive transcription from autonomous LINE-1 activity. In particular, it apportions the multiple reads aligned to the many LINE-1 instances in the genome into these two categories. Using our method, we evaluated well-established cell lines, cell-line compartments and healthy tissues and found that the vast majority (91.7%) of transcriptome reads overlapping LINE-1 derive from pervasive transcription. We validated TeXP by independently estimating the levels of LINE-1 autonomous transcription using ddPCR, finding high concordance. Next, we applied our method to comprehensively measure LINE-1 activity across healthy somatic cells, while backing out the effect of pervasive transcription. Unexpectedly, we found that LINE-1 activity is present in many normal somatic cells. This finding contrasts with earlier studies showing that LINE-1 has limited activity in healthy somatic tissues, except for neuroprogenitor cells. Interestingly, we found that the amount of LINE-1 activity was associated with the with the amount of cell turnover, with tissues with low cell turnover rates (e.g. the adult central nervous system) showing lower LINE-1 activity. Altogether, our results show how accounting for pervasive transcription is critical to accurately quantify the activity of highly repetitive regions of the human genome.Author Summary Repetitive sequences, such as LINEs, comprise more than half of the human genome. Due to their repetitive nature, LINEs are hard to grasp. In particular, we find that pervasive transcription is a major confounding factor in transcriptome data. We observe that, on average, more than 90% of LINE signal derives from pervasive transcription. To investigate this issue, we developed and validated a new method called TeXP. TeXP accounts and removes the effects of pervasive transcription when quantifying LINE activity. Our method uses the broad distribution of LINEs to estimate the effects of pervasive transcription. Using TeXP, we processed thousands of transcriptome datasets to uniformly, and unbiasedly measure LINE-1 activity across healthy somatic cells. By removing the pervasive transcription component, we find that (1) LINE-1 is broadly expressed in healthy somatic tissues; (2) Adult brain show small levels of LINE transcription and; (3) LINE-1 transcription level is correlated with tissue cell turnover. Our method thus offers insights into how repetitive sequences and influenced by pervasive transcription. Moreover, we uncover the activity of LINE-1 in somatic tissues at an unmatched scale.