Generation of the Exact Distribution and Simulation of Matched Nucleotide Sequences on a Phylogenetic Tree

Ababneh, Faisal; Jermiin, Lars S.; Robinson, John

doi:10.1007/s10852-005-9017-y

Generation of the Exact Distribution and Simulation of Matched Nucleotide Sequences on a Phylogenetic Tree

Published: 28 February 2006

Volume 5, pages 291–308, (2006)
Cite this article

Journal of Mathematical Modelling and Algorithms

Faisal Ababneh¹,
Lars S. Jermiin^2,3,4 &
John Robinson¹

103 Accesses
15 Citations
Explore all metrics

Abstract

Nucleotide sequences are often generated by Monte Carlo simulations to address complex evolutionary or analytic questions but the simulations are rarely described in sufficient detail to allow the research to be replicated. Here we briefly review the Markov processes of substitution in a pair of matching (homologous) nucleotide sequences and then extend it to k matching nucleotide sequences. We describe calculation of the joint distribution of nucleotides of two matching sequences. Based on this distribution, we give a method for simulation of the divergence matrix for n sites using the multinomial distribution. This is then extended to the joint distribution for k nucleotide sequences and the corresponding 4^k divergence array, generalizing Felsenstein (Journal of Molecular Evolution, 17, 368–376, 1981), who considered stationary, homogeneous and reversible processes on trees. We give a second method to generate matched sequences that begins with a random ancestral sequence and applies a continuous Markov process to each nucleotide site as in Rambaut and Grassly (Computer Applications in the Biosciences, 13, 235–238, 1997); further, we relate this to an equivalent approach based on an embedded Markov chain. Finally, we describe an approximate method that was recently implemented in a program developed by Jermiin et al. (Applied Bioinformatics, 2, 159–163, 2003). The three methods presented here cater for different computational and mathematical limitations and are shown in an example to produce results close to those expected on theoretical grounds. All methods are implemented using functions in the S-plus or R languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Coev-web: a web platform designed to simulate and evaluate coevolving positions along a phylogenetic tree

Article Open access 23 November 2015

Linda Dib, Xavier Meyer, … Nicolas Salamin

Estimating Phylogenetic Trees

bModelTest: Bayesian phylogenetic site model averaging and model comparison

Article Open access 06 February 2017

Remco R. Bouckaert & Alexei J. Drummond

References

Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., Rapp, B. A. and Wheeler, D. L.: GenBank, Nucleic Acids Res. 28 (2000), 15–18
Article Google Scholar
Conant, G. C. and Lewis, P. O.: Effects of nucleotide compositional bias in the success of the parsimony criterion in phylogenetic inference, Mol. Biol. Evol. 18 (2001), 1024–1033.
Google Scholar
Felsenstein, J.: Evolutionary trees from DNA sequences: A maximum likelihood approach, J. Mol. Evol. 17 (1981), 368–376.
Article Google Scholar
Felsenstein, J.: Inferring Phylogenies, Sinauer, Sunderland, Massachusetts, USA, 2004.
Felsenstein, J.: PHYLIP (Phylogeny Inference Package), version 3.62, Distributed by the author. Department of Genome Sciences, University of Washington, Seattle, 2004.
Gaut, B. S. and Lewis, P. O.: Success of maximum likelihood phylogeny inference in the four-taxon case, Mol. Biol. Evol. 12 (1995), 152–162.
Google Scholar
Ho, S. Y. W. and Jermiin, L. S.: Tracing the decay of the historical signal in biological sequence data, Syst. Biol. 53 (2004), 623–637.
Article Google Scholar
Jermiin, L. S., Ho, S. Y. W., Ababneh, F., Robinson, J. and Larkum, A. W. D.: Hetero: A program to simulate the evolution of DNA on a four-taxon tree, Appl. Bioinformatics 2 (2003), 159–163.
Google Scholar
Jermiin, L. S., Ho, S. Y. W., Ababneh, F., Robinson, J. and Larkum, A. W. D.: The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. Syst. Biol. 53 (2004), 638–643.
Article Google Scholar
Lake, J. A.: Reconstructing evolutionary trees from DNA and protein sequences: Paralinear distances. Proc. Natl. Acad. Sci. USA. 91 (1994), 1155–1159.
Google Scholar
Lockhart, P. J., Steel, M. A., Hendy, M. D. and Penny, D.: Recovering evolutionary trees under a more realistic model of sequence evolution, Mol. Biol. Evol. 11 (1994), 605–612.
Google Scholar
Rambaut, A. and Grassly, N. C.: Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees, Comput. Appl. Biosci. 13 (1997), 235–238.
Google Scholar
Swofford, D. L., Olsen, G. J., Waddell, P. J. and Hillis, D. M.: Phylogenetic inference, in D. M. Hillis, D. Moritz and B. K. Mable (eds), Molecular Systematics, 2nd Edn., Sinauer, Sunderland, Massachusetts, USA, 1996, pp. 407–514.
Google Scholar
Tavaré, S.: Some probabilistic and statistical problems on the analysis of DNA sequences, Lect. Math. Life Sci. 17 (1986), 57–86.
Google Scholar
Van, Den Bussche, R. A., Baker, R. J., Huelsenbeck, J. P. and Hillis, D. M.: Base compositional bias and phylogenetic analyses: A test of the “flying DNA” hypothesis, Mol. Phylogenet. Evol. 10 (1998), 408–416.
Article Google Scholar
Zharkikh, A.: Estimation of evolutionary distances between nucleotide sequences, J. Mol. Evol. 39 (1994), 315–329.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia
Faisal Ababneh & John Robinson
School of Biological Sciences, University of Sydney, Sydney, NSW 2006, Australia
Lars S. Jermiin
Sydney University Biological Informatics and Technology Centre, University of Sydney, Sydney, NSW 2006, Australia
Lars S. Jermiin
Unité de Biologie Moléculaire de Gène chez les Extrêmophiles, Institut Pasteur, 75724, Paris Cedex, France
Lars S. Jermiin

Authors

Faisal Ababneh
View author publications
You can also search for this author in PubMed Google Scholar
Lars S. Jermiin
View author publications
You can also search for this author in PubMed Google Scholar
John Robinson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lars S. Jermiin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ababneh, F., Jermiin, L.S. & Robinson, J. Generation of the Exact Distribution and Simulation of Matched Nucleotide Sequences on a Phylogenetic Tree. J Math Model Algor 5, 291–308 (2006). https://doi.org/10.1007/s10852-005-9017-y

Download citation

Published: 28 February 2006
Issue Date: September 2006
DOI: https://doi.org/10.1007/s10852-005-9017-y

Mathematics Subject Classifications (2000):

62P10

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generation of the Exact Distribution and Simulation of Matched Nucleotide Sequences on a Phylogenetic Tree

Abstract

Access this article

Similar content being viewed by others

Coev-web: a web platform designed to simulate and evaluate coevolving positions along a phylogenetic tree

Estimating Phylogenetic Trees

bModelTest: Bayesian phylogenetic site model averaging and model comparison

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Mathematics Subject Classifications (2000):

Key words

Navigation

Generation of the Exact Distribution and Simulation of Matched Nucleotide Sequences on a Phylogenetic Tree

Abstract

Access this article

Similar content being viewed by others

Coev-web: a web platform designed to simulate and evaluate coevolving positions along a phylogenetic tree

Estimating Phylogenetic Trees

bModelTest: Bayesian phylogenetic site model averaging and model comparison

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classifications (2000):

Key words

Search

Navigation