Abstract
Transcription factor (TF) proteins play a critical role in the regulation of eukaryote gene expression by sequence-specific binding to genomic locations known as transcription factor binding sites.
Here we present the TFBSFootprinter tool which has been created to combine transcription-relevant data from six large empirical datasets: Ensembl, JASPAR, FANTOM5, ENCODE, GTEX, and GTRD to more accurately predict functional sites. A complete analysis integrating all experimental datasets can be performed on genes in the human genome, and a limited analysis can be done on a total of 125 vertebrate species.
As a use-case, we have used TFBSFootprinter to study sites of genomic variation between modern human and Neanderthal promoters. We found significant differences in binding affinity for 110 transcription factors, which are enriched for homeobox and brain. Analysis of single cell data show that a subset of these (CUX1, CUX2, ESRRG, FOXP1, FOXP2, MEF2C, POU6F2, PRRX1 and RORA) co-occur as marker genes in L4 glutamatergic neurons.
Differential binding sites for these transcription factors were found in 74 target genes, the largest number of which were found in the bidirectional promoter of key mitochondrial-function genes FARS2 and LYRM4.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Analysis of ChIP-Seq data was used to redefine target promoter regions (old, -900 to +100; new, -2,500 to +2,500; relative to TSS). Subsequently, all analyses of modern human vs. Neanderthal SNPs in promoters were re-run and the results re-analyzed. Details regarding the genes with differential binding in their promoters were added. Analyses on enrichment of the DB TFs in various cell and tissue types were added. Discussion has been expanded.