PT - JOURNAL ARTICLE AU - Mahoko Takahashi Ueda AU - Kirill Kryukov AU - Satomi Mitsuhashi AU - Hiroaki Mitsuhashi AU - Tadashi Imanishi AU - So Nakagawa TI - Comprehensive genomic analysis reveals dynamic evolution of endogenous retroviruses that code for retroviral-like protein domains AID - 10.1101/628875 DP - 2019 Jan 01 TA - bioRxiv PG - 628875 4099 - http://biorxiv.org/content/early/2019/11/12/628875.short 4100 - http://biorxiv.org/content/early/2019/11/12/628875.full AB - Endogenous retroviruses (ERVs) are remnants of ancient retroviral infections of mammalian germline cells. A large proportion of ERVs lose their open reading frames (ORFs), while others retain them and become exapted by the host species. However, it remains unclear what proportion of ERVs possess ORFs (ERV-ORFs), become transcribed, and serve as candidates for co-opted genes. Hence, we investigated characteristics of 176,401 ERV-ORFs containing retroviral-like protein domains (gag, pro, pol, and env) in 19 mammalian genomes. The fractions of ERVs possessing ORFs were overall small (∼0.15%) although they varied depending on domain types as well as species. The observed divergence of ERV-ORF from their consensus sequences suggested that a large proportion of ERV-ORFs either recently or anciently inserted themselves into mammalian genomes. Alternatively, very few ERVs lacking ORFs were found to exhibit similar divergence patterns. To identify ERV-ORFs transcribed as proteins, we compared ERV-ORFs with various multi-omics data including transcriptome data, trimethylation at histone H3 lysine 36, and transcription initiation sites from 2,834 cell types, and found 408 and 752 ERV-ORFs, accounting for 2-3% of all ERV-ORFs, with high transcriptional potential in humans and mice, respectively. Moreover, many of these ERV-ORFs with transcriptional potential were lineage-specific sequences exhibiting tissue-specific expression. These results suggest a possibility for the expression of uncharacterized functional genes containing ERV-ORFs hidden within mammalian genomes. Together, our analyses suggest that more ERV-ORFs may be co-opted in a host-species specific manner than we currently know, which are likely to have contributed to mammalian evolution and diversification.ERVendogenous retrovirusesERV-ORFendogenous retrovirus open reading frameTSStranscription start siteTEtransposable elementHMMhidden Markov modelCAGEcap analysis of gene expressionPCAPrincipal component analysisGTFgene transfer formatSRAsequence read archiveGTExgenotype-tissue expressionCHESScomprehensive human expressed sequenceESembryonic stemEBembryoid bodyiPSinduced pluripotent stem