Abstract
Thousands of candidate human-specific genomic regulatory loci (HSGRL) have been identified, supporting the idea that unique to human phenotypes result from human-specific changes to genomic regulatory networks (GRNs). A notable common feature of HSGRL is a predominant location within non-protein coding sequences. A significant void is the lack of a genome-wide view on diverse families of HSGRL within the context of the principal regulatory structures of the interphase chromatin, namely topologically-associating domains (TADs) and specific sub-TAD structures termed super-enhancer domains (SEDs). Genome-wide proximity placement analysis of 10,598 HSGRL revealed that 0.8%-10.3% of TADs contain more than half of HSGRL. Of the 3,127 TADs in the hESC genome, 24 (0.8%); 53 (1.7%); 259 (8.3%); and 322 (10.3%) harbor 1,110 (52.4%); 1,936 (50.9%); 1,151 (59.6%); and 1,601 (58.3%) HSGRL sequences from four distinct families, respectively. TADs that are enriched for HSGRL and termed rapidly-evolving in humans TADs (revTADs) manifest distinct correlation patterns between HSGRL placements and recombination rates. There are significant enrichment within revTAD boundaries of hESC-enhancers, primate-specific CTCF-binding sites, human-specific RNAPII-binding sites, hCONDELs, and H3K4me3 peaks with human-specific enrichment at TSS in prefrontal cortex neurons (p < 0.0001 in all instances). In hESC genome, 331 of 504 (66%) of SE-harboring TADs contain HSGRL and 68% of SEs co-localize with HSGRL, suggesting that HSGRL rewired SE-driven GRNs within revTADs by inserting novel and/or erasing existing regulatory sequences. Consequently, markedly distinct features of chromatin structures evolved in hESC compared to mouse: the SE quantity is 3-fold higher and the median SE size is significantly larger; concomitantly, the TAD number is increased by 42% while the median TAD size is decreased (p=9.11E-37). Present analyses revealed a global role for HSGRL in increasing both quantity and size of SEs and increasing the number and size reduction of TADs, which may facilitate a convergence of TAD and SED architectures of interphase chromatin and define a trend of increasing regulatory complexity during evolution of GRNs.
List of abbreviations
- 5hmC
- 5-Hydromethylcytosine
- CTCF
- CCCTC-binding factor
- DHS
- DNase hypersensitivity sites
- FHSRR
- fixed human-specific regulatory regions
- GRNs
- genomic regulatory networks
- HAR
- human accelerated regions
- hCONDEL
- human-specific conserved deletions
- hESC
- human embryonic stem cells
- HSGRL
- human-specific genomic regulatory loci
- HSNBS
- human-specific NANOG-binding sites
- HSTFBS
- human-specific transcription factor-binding sites
- LAD
- lamina-associated domain
- LINE
- long interspersed nuclear element
- lncRNA
- long non-coding RNA
- LTR
- long terminal repeat
- MADE
- methylation-associated DNA editing
- mC
- methylcytosine
- mESC
- mouse embryonic stem cells
- NANOG
- Nanog homeobox
- nt
- nucleotide
- POU5F1
- POU class 5 homeobox 1
- PSDS
- partial strand displacement state
- TAD
- topologically associating domains
- TE
- transposable elements
- TF
- transcription factor
- TSC
- triple-stranded complex
- TSS
- transcription start sites
- SE
- super-enhancers
- SED
- super-enhancer domains
- sncRNA
- small non coding RNA
List of abbreviations
- 5hmC
- 5-Hydromethylcytosine
- CTCF
- CCCTC-binding factor
- DHS
- DNase hypersensitivity sites
- FHSRR
- fixed human-specific regulatory regions
- GRNs
- genomic regulatory networks
- HAR
- human accelerated regions
- hCONDEL
- human-specific conserved deletions
- hESC
- human embryonic stem cells
- HSGRL
- human-specific genomic regulatory loci
- HSNBS
- human-specific NANOG-binding sites
- HSTFBS
- human-specific transcription factor-binding sites
- LAD
- lamina-associated domain
- LINE
- long interspersed nuclear element
- lncRNA
- long non-coding RNA
- LTR
- long terminal repeat
- MADE
- methylation-associated DNA editing
- mC
- methylcytosine
- mESC
- mouse embryonic stem cells
- NANOG
- Nanog homeobox
- nt
- nucleotide
- POU5F1
- POU class 5 homeobox 1
- PSDS
- partial strand displacement state
- TAD
- topologically associating domains
- TE
- transposable elements
- TF
- transcription factor
- TSC
- triple-stranded complex
- TSS
- transcription start sites
- SE
- super-enhancers
- SED
- super-enhancer domains
- sncRNA
- small non coding RNA