RT Journal Article SR Electronic T1 ChromBERT: Uncovering Chromatin State Motifs in the Human Genome Using a BERT-based Approach JF bioRxiv FD Cold Spring Harbor Laboratory SP 2024.07.25.605219 DO 10.1101/2024.07.25.605219 A1 Lee, Seohyun A1 Lin, Che A1 Chen, Chien-Yu A1 Nakato, Ryuichiro YR 2024 UL http://biorxiv.org/content/early/2024/07/26/2024.07.25.605219.abstract AB Chromatin states, fundamental to gene regulation and cellular identity, are defined by a unique combination of histone post-translational modifications. Despite their importance, comprehensive patterns within chromatin state sequences, which could provide insights into key biological functions, remain largely unexplored. In this study, we introduce ChromBERT, a BERT-based model specifically designed to detect distinct patterns of chromatin state annotation data sequences. Notably, ChromBERT was pre-trained on promoter regions across a diverse range of epigenomes and subsequently fine-tuned using a dataset from multiple cell lines where RNA-seq data were available, highlighting the model’s ability to discern conserved chromatin state patterns within these regions. In addition to its predictive powers across tasks, evidenced by high AUC scores, ChromBERT provides further analysis through the incorporation of motif clustering using Dynamic Time Warping (DTW). This method enhances the model’s ability to dissect chromatin state sequence motifs, typically involving transcription and enhancer sites. The introduction of motif clustering with DTW into ChromBERT’s workflow is poised to facilitate the discovery of genomic regions linked to novel biological functions, deepening our understanding of chromatin state dynamics.Competing Interest StatementThe authors have declared no competing interest.