RT Journal Article SR Electronic T1 Notos - a Galaxy tool to analyze CpN observed expected ratios for inferring DNA methylation types JF bioRxiv FD Cold Spring Harbor Laboratory SP 180463 DO 10.1101/180463 A1 Ingo Bulla A1 Benoît Aliaga A1 Virginia Lacal A1 Jan Bulla A1 Christoph Grunau A1 Cristian Chaparro YR 2017 UL http://biorxiv.org/content/early/2017/08/25/180463.abstract AB Background DNA methylation patterns store epigenetic information in the vast majority of eukaryotic species. The relatively high costs and technical challenges associated with the detection of DNA methylation however have created a bias in the number of methylation studies towards model organisms. Consequently, it remains challenging to infer kingdom-wide general rules about the functions and evolutionary conservation of DNA methylation. Methylated cytosine is often found in specific CpN dinucleotides, and the frequency distributions of, for instance, CpG observed/expected (CpG o/e) ratios have been used to infer DNA methylation types based on higher mutability of methylated CpG.Results Predominantly model-based approaches essentially founded on mixtures of Gaussian distributions are currently used to investigate questions related to the number and position of modes of CpG o/e ratios. These approaches require the selection of an appropriate criterion for determining the best model and will fail if empirical distributions are complex or even merely moderately skewed. We use a kernel density estimation (KDE) based technique for robust and precise characterization of complex CpN o/e distributions without a priori assumptions about the underlying distributions.Conclusions We show that KDE delivers robust descriptions of CpN o/e distributions. For straightforward processing, we have developed a Galaxy tool, called Notos and available at the ToolShed, that calculates these ratios of input FASTA files and fits a density to their empirical distribution. Based on the estimated density the number and shape of modes of the distribution is determined, providing a rational for the prediction of the number and the types of different methylation classes. Notos is written in R and Perl.KDEkernel density estimationCpN o/eobserved to expected ratio of di-nucleotides composed of cytosine, followed by any nucleotide in 5’-3’ directionCpG o/eobserved to expected ratio of di-nucleotides composed of cytosine, followed by guanine in 5’-3’ directionAICAkaike Information criterionBICBayesian information criterionICLIntegrated Completed LikelihooddbESTdatabase of Expressed Sequence Tags