PT - JOURNAL ARTICLE AU - Sereshki, Saleh AU - Lonardi, Stefano TI - Predicting Differentially Methylated Cytosines in TET and DNMT3 Knockout Mutants via a Large Language Model AID - 10.1101/2024.05.02.592257 DP - 2024 Jan 01 TA - bioRxiv PG - 2024.05.02.592257 4099 - http://biorxiv.org/content/early/2024/09/04/2024.05.02.592257.short 4100 - http://biorxiv.org/content/early/2024/09/04/2024.05.02.592257.full AB - DNA cytosine methylation is an epigenetic marker which regulates many cellular processes. Mammalian genomes typically maintain consistent methylation patterns over time, except in specific regulatory regions like promoters and certain types of enhancers. The dynamics of DNA methylation is controlled by a complex cellular machinery, in which the enzymes DNMT3 and TET play a major role. This study explores the identification of differentially methylated cytosines (DMCs) in TET and DNMT3 knockout mutants in mice and human embryonic stem cells. We investigate (i) whether a large language model can be trained to recognize DMCs in human and mouse from the sequence surrounding the cytosine of interest, (ii) whether a classifier trained on human knockout data can predict DMCs in the mouse genome (and vice versa), (iii) whether a classifier trained on DNMT3 knockout can predict DMCs for TET knockout (and vice versa). Our study identifies statistically significant motifs associated with the prediction of DMCs each mutant, casting a new light on the understanding of DNA methylation dynamics in stem cells. Our software tool is available at https://github.com/ucrbioinfo/dmc_prediction.Competing Interest StatementThe authors have declared no competing interest.Abbreviations5mC= 5-methylcytosineDMC= differentially methylated cytosines LLM = large language modelL-MAP= language model-based methyltransferases activity predictorAUC= area under the curveTET= ten eleven translocation (enzyme) DNMT = DNA methyltransferase (enzyme) ESC = embryonic stem cellsISC= intestinal stem cells