%0 Journal Article %A Kiyoshi Ezawa %A Dan Graur %A Giddy Landan %T Perturbative formulation of general continuous-time Markov model of sequence evolution via insertions/deletions, Part I: Theoretical basis %D 2015 %R 10.1101/023598 %J bioRxiv %P 023598 %X Background Insertions and deletions (indels) account for more nucleotide differences between two related DNA sequences than substitutions do, and thus it is imperative to develop a stochastic evolutionary model that enables us to reliably calculate the probability of the sequence evolution through indel processes. Recently, such probabilistic models are mostly based on either hidden Markov models (HMMs) or transducer theories, both of which give the indel component of the probability of a given sequence alignment as a product of either probabilities of column-to-column transitions or block-wise contributions along the alignment. These models, however, have two fundamental problems: (1) it is unclear how they are related with any genuine evolutionary model, which describes the stochastic evolution of an entire sequence along the time-axis; and (2) they cannot fully accommodate biologically realistic features, such as overlapping indels, power-law indel-length distributions, and indel rate variation across regions.Results Here, we theoretically tackle the ab initio calculation of the probability of a given sequence alignment under a genuine evolutionary model, more specifically, a general continuous-time Markov model of the evolution of an entire sequence via insertions and deletions. Our model allows general indel rate parameters including length distributions but does not impose any unrealistic restrictions on indels. Using techniques of the perturbation theory in physics, we expand the probability into a series over different numbers of indels. This perturbation expansion provides a concise version of Feller’s theorem (1940), which underpins the authenticity of the widely used stochastic evolutionary simulation method by Gillespie (1977). We find a sufficient and nearly necessary set of conditions under which the probability can be expressed as the product of an overall factor and the contributions from regions separated by gapless columns of the alignment. The indel models satisfying these conditions include those with some kind of rate variation across regions, as well as space-homogeneous models. We also prove that, though with a caveat, pairwise probabilities calculated by the method of Miklós et al. (2004) are equivalent to those calculated by our ab initio formulation, at least under a space-homogenous model.Conclusions Our ab initio perturbative formulation provides a firm theoretical ground that other indel models can rest on.[This paper and three other papers (Ezawa, Graur and Landan 2015a,b,c) describe a series of our efforts to develop, apply, and extend the ab initio perturbative formulation of a general continuous-time Markov model of indels.] %U https://www.biorxiv.org/content/biorxiv/early/2015/07/31/023598.full.pdf