Abstract
Background Sequence logos can effectively visualize position specific base preferences evident in a collection of binding sites of some transcription factor. But those preferences usually fall far short of fully explaining binding specificity. Interestingly, some transcription factors bind sites of potentially methylated DNA. For example, MYC binds CpG sites. This may increase binding specificity as such sites are 1) highly under-represented in the genome, and 2) offer additional, tissue specific information in the form of hypo- or hyper-methylation. Fortunately, bisulfite sequencing data suitable to investigate this possibility is readily available.
Method We developed MethylSeqLogo, an extension of sequence logos which adds DNA methylation information to sequence logos. MethylSeqLogo includes new elements to indicate DNA methylation and under-represented dimers in each position of a set of aligned binding sites. Our method displays information from both DNA strands, and takes into account the sequence context (CpG or other) and genome region (promoter versus whole genome) appropriate to properly assess the expected background dimer frequency and level of methylation.
When designing MethylSeqLogo, we took care to preserve the usual sequence logo meaning of heights; in which the relative height of nucleotides within a column represents their proportion in the binding sites, while the absolute height of each column represents information (relative entropy) and the height of all columns added together represents total information.
Results We present several figures illustrating the utility of using MethylSeqLogo to summarize data from CpG binding transcription factors. The logos show that unmethylated CpG binding sites are a feature of transcription factors such as MYC and ZBTB33, while some other CpG binding transcription factors, such as CEBPB, appear methylation neutral. We also compare MethylSeqLogo with two previously reported ways to create methylation aware sequence logos.
Conclusions Our freely available software enables users to explore large-scale bisulfite and ChIP sequencing data sets — and in the process obtain publication quality figures.
Competing Interest Statement
The authors have declared no competing interest.
Abbreviations
- H1-hESC
- Human Embryonic Stem Cell line H1
- TF
- Transcription Factor
- TFBS
- Transcription Factor Binding Sites
- WGBS
- Whole Genome Bisulfite Sequencing