Transposable Elements and DNA Methylation Create in Embryonic Stem Cells Human-Specific Regulatory Sequences Associated with Distal Enhancers and Noncoding RNAs

Genome Biol Evol. 2015 May 7;7(6):1432-54. doi: 10.1093/gbe/evv081.

Abstract

Despite significant progress in the structural and functional characterization of the human genome, understanding of the mechanisms underlying the genetic basis of human phenotypic uniqueness remains limited. Here, I report that transposable element-derived sequences, most notably LTR7/HERV-H, LTR5_Hs, and L1HS, harbor 99.8% of the candidate human-specific regulatory loci (HSRL) with putative transcription factor-binding sites in the genome of human embryonic stem cells (hESC). A total of 4,094 candidate HSRL display selective and site-specific binding of critical regulators (NANOG [Nanog homeobox], POU5F1 [POU class 5 homeobox 1], CCCTC-binding factor [CTCF], Lamin B1), and are preferentially located within the matrix of transcriptionally active DNA segments that are hypermethylated in hESC. hESC-specific NANOG-binding sites are enriched near the protein-coding genes regulating brain size, pluripotency long noncoding RNAs, hESC enhancers, and 5-hydroxymethylcytosine-harboring regions immediately adjacent to binding sites. Sequences of only 4.3% of hESC-specific NANOG-binding sites are present in Neanderthals' genome, suggesting that a majority of these regulatory elements emerged in Modern Humans. Comparisons of estimated creation rates of novel TF-binding sites revealed that there was 49.7-fold acceleration of creation rates of NANOG-binding sites in genomes of Chimpanzees compared with the mouse genomes and further 5.7-fold acceleration in genomes of Modern Humans compared with the Chimpanzees genomes. Preliminary estimates suggest that emergence of one novel NANOG-binding site detectable in hESC required 466 years of evolution. Pathway analysis of coding genes that have hESC-specific NANOG-binding sites within gene bodies or near gene boundaries revealed their association with physiological development and functions of nervous and cardiovascular systems, embryonic development, behavior, as well as development of a diverse spectrum of pathological conditions such as cancer, diseases of cardiovascular and reproductive systems, metabolic diseases, multiple neurological and psychological disorders. A proximity placement model is proposed explaining how a 33-47% excess of NANOG, CTCF, and POU5F1 proteins immobilized on a DNA scaffold may play a functional role at distal regulatory elements.

Keywords: CTCF; L1 retrotransposition; LINE; LTR; LTR7 RNAs; LTR7/HERVH; NANOG; POU5F1 (OCT4); evolution of Modern Humans; human ESC; methyl-cytosine deamination; pluripotent state regulators; primate evolution; repetitive elements; retrotransposons.

MeSH terms

  • Animals
  • Binding Sites
  • Brain / metabolism
  • Cell Differentiation
  • Chromatin / metabolism
  • DNA Methylation*
  • Embryonic Stem Cells / metabolism*
  • Enhancer Elements, Genetic*
  • Evolution, Molecular
  • Genome, Human
  • Homeodomain Proteins / metabolism
  • Humans
  • Mice
  • Mutation
  • Nanog Homeobox Protein
  • Nuclear Lamina / genetics
  • Primates
  • RNA, Long Noncoding / genetics
  • RNA, Untranslated / genetics*
  • Regulatory Elements, Transcriptional*
  • Retroelements*
  • Transcription Factors / metabolism
  • Transcription, Genetic

Substances

  • Chromatin
  • Homeodomain Proteins
  • NANOG protein, human
  • Nanog Homeobox Protein
  • RNA, Long Noncoding
  • RNA, Untranslated
  • Retroelements
  • Transcription Factors