PT - JOURNAL ARTICLE AU - Burkhard Morgenstern TI - Sequence Comparison without Alignment: The <em>SpaM</em> approaches AID - 10.1101/2019.12.16.878314 DP - 2019 Jan 01 TA - bioRxiv PG - 2019.12.16.878314 4099 - http://biorxiv.org/content/early/2019/12/17/2019.12.16.878314.short 4100 - http://biorxiv.org/content/early/2019/12/17/2019.12.16.878314.full AB - Sequence alignment is at the heart of DNA and protein sequence analysis. For the data volumes that are nowadays produced by massively parallel sequencing technologies, however, pairwise and multiple alignment methods have become too slow for many data-analysis tasks. Therefore, fast alignment-free approaches to sequence comparison have become popular in recent years. Most of these approaches are based on word frequencies, for words of a fixed length, or on word-matching statistics. Other approaches are based on the length of maximal word matches. While these methods are very fast, most of them are based on ad-hoc measures of sequences similarity or dissimilarity that are often hard to interpret. In this review article, I describe a number of alignment-free methods that we developed in recent years. Our approaches are based on spaced word matches (‘SpaM’), i.e. on inexact word matches, that are allowed to contain mismatches at certain pre-defined positions. Unlike most previous alignment-free approaches, our approaches are able to accurately estimate phylogenetic distances between DNA or protein sequences based on stochastic models of molecular evolution.