Skip to main content

Identifying Optimal Models of Evolution

  • Protocol
  • First Online:
Bioinformatics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1525))

Abstract

Most phylogenetic methods are model-based and depend on models of evolution designed to approximate the evolutionary processes. Several methods have been developed to identify suitable models of evolution for phylogenetic analysis of alignments of nucleotide or amino acid sequences and some of these methods are now firmly embedded in the phylogenetic protocol. However, in a disturbingly large number of cases, it appears that these models were used without acknowledgement of their inherent shortcomings. In this chapter, we discuss the problem of model selection and show how some of the inherent shortcomings may be identified and overcome.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zakharov EV, Caterino MS, Sperling FAH (2004) Molecular phylogeny, historical biogeography, and divergence time estimates for swallowtail butterflies of the genus Papilio (Lepidoptera: Papilionidae). Syst Biol 53:193–215

    Article  PubMed  Google Scholar 

  2. Brochier C, Forterre P, Gribaldo S (2005) An emerging phylogenetic core of Archaea: phylogenies of transcription and translation machineries converge following addition of new genome sequences. BMC Evol Biol 5:36

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  3. Hardy MP, Owczarek CM, Jermiin LS et al (2004) Characterization of the type I interferon locus and identification of novel genes. Genomics 84:331–345

    Article  CAS  PubMed  Google Scholar 

  4. de Queiroz K, Gauthier J (1994) Toward a phylogenetic system of biological nomenclature. Trends Ecol Evol 9:27–31

    Article  PubMed  Google Scholar 

  5. Board PG, Coggan M, Chelnavayagam G et al (2000) Identification, characterization and crystal structure of the Omega class of glutathione transferases. J Biol Chem 275:24798–24806

    Article  CAS  PubMed  Google Scholar 

  6. Pagel M (1999) Inferring the historical patterns of biological evolution. Nature 401:877–884

    Article  CAS  PubMed  Google Scholar 

  7. Charleston MA, Robertson DL (2002) Preferential host switching by primate lentiviruses can account for phylogenetic similarity with the primate phylogeny. Syst Biol 51:528–535

    Article  CAS  PubMed  Google Scholar 

  8. Jermann TM, Opitz JG, Stackhouse J et al (1995) Reconstructing the evolutionary history of the artiodactyl ribonuclease superfamily. Nature 374:57–59

    Article  CAS  PubMed  Google Scholar 

  9. Eisen JA (1998) Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res 8:163–167

    Article  CAS  PubMed  Google Scholar 

  10. Misof B, Liu SL, Meusemann K et al (2014) Phylogenomics resolves the timing and pattern of insect evolution. Science 346:763–767

    Article  CAS  PubMed  Google Scholar 

  11. Darriba D, Taboada GL, Doallo R et al (2011) ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27:1164–1165

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Darriba D, Taboada GL, Doallo R et al (2012) jModelTest 2: more models, new heuristics and parallel computing. Nat Methods 9:772

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Lanfear R, Calcott B, Ho SYW et al (2012) Partitionfinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol Biol Evol 29:1695–1701

    Article  CAS  PubMed  Google Scholar 

  14. Lanfear R, Calcott B, Kainer D et al (2014) Selecting optimal partitioning schemes for phylogenomic datasets. BMC Evol Biol 14:82

    Article  PubMed  PubMed Central  Google Scholar 

  15. Jermiin LS, Ho JWK, Lau KW et al (2009) SeqVis: a tool for detecting compositional heterogeneity among aligned nucleotide sequences. In: Posada D (ed) Bioinformatics for DNA sequence analysis. Humana Press, Totowa, NJ, pp 65–91

    Chapter  Google Scholar 

  16. Barry D, Hartigan JA (1987) Statistical analysis of hominoid molecular evolution. Stat Sci 2:191–210

    Article  Google Scholar 

  17. Reeves J (1992) Heterogeneity in the substitution process of amino acid sites of proteins coded for by the mitochondrial DNA. J Mol Evol 35:17–31

    Article  CAS  PubMed  Google Scholar 

  18. Steel MA, Lockhart PJ, Penny D (1993) Confidence in evolutionary trees from biological sequence data. Nature 364:440–442

    Article  CAS  PubMed  Google Scholar 

  19. Lake JA (1994) Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc Natl Acad Sci U S A 91:1455–1459

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Lockhart PJ, Steel MA, Hendy MD et al (1994) Recovering evolutionary trees under a more realistic model of sequence evolution. Mol Biol Evol 11:605–612

    CAS  PubMed  Google Scholar 

  21. Steel MA (1994) Recovering a tree from the leaf colourations it generates under a Markov model. Appl Math Lett 7:19–23

    Article  Google Scholar 

  22. Galtier N, Gouy M (1995) Inferring phylogenies from DNA sequences of unequal base compositions. Proc Natl Acad Sci U S A 92:11317–11321

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Steel MA, Lockhart PJ, Penny D (1995) A frequency-dependent significance test for parsimony. Mol Phylogenet Evol 4:64–71

    Article  CAS  PubMed  Google Scholar 

  24. Yang Z, Roberts D (1995) On the use of nucleic acid sequences to infer early branches in the tree of life. Mol Biol Evol 12:451–458

    CAS  PubMed  Google Scholar 

  25. Gu X, Li W-H (1996) Bias-corrected paralinear and logdet distances and tests of molecular clocks and phylogenies under nonstationary nucleotide frequencies. Mol Biol Evol 13:1375–1383

    Article  CAS  PubMed  Google Scholar 

  26. Gu X, Li W-H (1998) Estimation of evolutionary distances under stationary and nonstationary models of nucleotide substitution. Proc Natl Acad Sci U S A 95:5899–5905

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Galtier N, Gouy M (1998) Inferring pattern and process: maximum-likelihood implementation of a nonhomogenous model of DNA sequence evolution for phylogenetic analysis. Mol Biol Evol 15:871–879

    Article  CAS  PubMed  Google Scholar 

  28. Galtier N, Tourasse N, Gouy M (1999) A nonhyperthermophilic common ancestor to extant life forms. Science 283:220–221

    Article  CAS  PubMed  Google Scholar 

  29. Tamura K, Kumar S (2002) Evolutionary distance estimation under heterogeneous substitution pattern among lineages. Mol Biol Evol 19:1727–1736

    Article  CAS  PubMed  Google Scholar 

  30. Foster PG (2004) Modelling compositional heterogeneity. Syst Biol 53:485–495

    Article  PubMed  Google Scholar 

  31. Thollesson M (2004) LDDist: a Perl module for calculating LogDet pair-wise distances for protein and nucleotide sequences. Bioinformatics 20:416–418

    Article  CAS  PubMed  Google Scholar 

  32. Jayaswal V, Jermiin LS, Robinson J (2005) Estimation of phylogeny using a general Markov model. Evol Bioinf Online 1:62–80

    Google Scholar 

  33. Blanquart S, Lartillot N (2006) A Bayesian compound stochastic process for modeling nonstationary and nonhomogeneous sequence evolution. Mol Biol Evol 23:2058–2071

    Article  CAS  PubMed  Google Scholar 

  34. Jayaswal V, Robinson J, Jermiin LS (2007) Estimation of phylogeny and invariant sites under the General Markov model of nucleotide sequence evolution. Syst Biol 56:155–162

    Article  CAS  PubMed  Google Scholar 

  35. Blanquart S, Lartillot N (2008) A site- and time-heterogeneous model of amino acid replacement. Mol Biol Evol 25:842–858

    Article  CAS  PubMed  Google Scholar 

  36. Dutheil J, Boussau B (2008) Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs. BMC Evol Biol 8:255

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  37. Jayaswal V, Jermiin LS, Poladian L et al (2011) Two stationary, non-homogeneous Markov models of nucleotide sequence evolution. Syst Biol 60:74–86

    Article  CAS  PubMed  Google Scholar 

  38. Jayaswal V, Ababneh F, Jermiin LS et al (2011) Reducing model complexity when the evolutionary process over an edge is modeled as a homogeneous Markov process. Mol Biol Evol 28:3045–3059

    Article  CAS  PubMed  Google Scholar 

  39. Dutheil JY, Galtier N, Romiguier J et al (2012) Efficient selection of branch-specific models of sequence evolution. Mol Biol Evol 29:1861–1874

    Article  CAS  PubMed  Google Scholar 

  40. Zou LW, Susko E, Field C et al (2012) Fitting nonstationary general-time-reversible models to obtain edge-lengths and frequencies for the Barry-Hartigan model. Syst Biol 61:927–940

    Article  PubMed  PubMed Central  Google Scholar 

  41. Groussin M, Boussau B, Gouy M (2013) A branch-heterogeneous model of protein evolution for efficient inference of ancestral sequences. Syst Biol 62:523–538

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Jayaswal V, Wong TKF, Robinson J et al (2014) Mixture models of nucleotide sequence evolution that account for heterogeneity in the substitution process across sites and across lineages. Syst Biol 63:726–742

    Article  PubMed  Google Scholar 

  43. Woodhams MD, Fernandez-Sanchez J, Sumner JG (2015) A new hierarchy of phylogenetic models consistent with heterogeneous substitution rates. Syst Biol 64:638–650

    Article  PubMed  PubMed Central  Google Scholar 

  44. Jermiin LS, Jayaswal V, Ababneh F et al (2008) Phylogenetic model evaluation. In: Keith J (ed) Bioinformatics: data, sequence analysis, and evolution. Humana Press, Totowa, NJ, pp 331–364

    Chapter  Google Scholar 

  45. Sullivan J, Arellano EA, Rogers DS (2000) Comparative phylogeography of Mesoamerican highland rodents: concerted versus independent responses to past climatic fluctuations. Am Nat 155:755–768

    Article  CAS  PubMed  Google Scholar 

  46. Demboski JR, Sullivan J (2003) Extensive mtDNA variation within the yellow-pine chipmunk, Tamias amoenus (Rodentia: Sciuridae), and phylogeographic inferences for northwestern North America. Mol Phylogenet Evol 26:389–408

    Article  CAS  PubMed  Google Scholar 

  47. Carstens BC, Stevenson AL, Degenhardt JD et al (2004) Testing nested phylogenetic and phylogeographic hypotheses in the Plethodon vandykei species group. Syst Biol 53:781–792

    Article  PubMed  Google Scholar 

  48. Penny D, Hendy MD, Steel MA (1992) Progress with methods for constructing evolutionary trees. Trends Ecol Evol 7:73–79

    Article  CAS  PubMed  Google Scholar 

  49. Tavaré S (1986) Some probabilistic and statistical problems on the analysis of DNA sequences. Lect Math Life Sci 17:57–86

    Google Scholar 

  50. Ababneh F, Jermiin LS, Robinson J (2006) Generation of the exact distribution and simulation of matched nucleotide sequences on a phylogenetic tree. J Math Model Algor 5:291–308

    Article  Google Scholar 

  51. Bryant D, Galtier N, Poursat M-A (2005) Likelihood calculation in molecular phylogenetics. In: Gascuel O (ed) Mathematics of evolution and phylogeny. Oxford University Press, Oxford, pp 33–62

    Google Scholar 

  52. Ullah I, Sjöstrand J, Andersson P et al (2015) Integrating sequence evolution into probabilistic orthology analysis. Syst Biol 64:969–982

    Article  PubMed  Google Scholar 

  53. Drouin G, Prat F, Ell M et al (1999) Detecting and characterizing gene conversion between multigene family members. Mol Biol Evol 16:1369–1390

    Article  CAS  PubMed  Google Scholar 

  54. Posada D, Crandall KA (2001) Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc Natl Acad Sci U S A 98:13757–13762

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Posada D (2002) Evaluation of methods for detecting recombination from DNA sequences: empirical data. Mol Biol Evol 19:708–717

    Article  CAS  PubMed  Google Scholar 

  56. Martin DP, Williamson C, Posada D (2005) RDP2: recombination detection and analysis from sequence alignments. Bioinformatics 21:260–262

    Article  CAS  PubMed  Google Scholar 

  57. Bruen TC, Philippe H, Bryant D (2006) A simple and robust statistical test for detecting the presence of recombination. Genetics 172:2665–2681

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Ragan MA (2001) On surrogate methods for detecting lateral gene transfer. FEMS Microbiol Lett 201:187–191

    Article  CAS  PubMed  Google Scholar 

  59. Dufraigne C, Fertil B, Lespinats S et al (2005) Detection and characterization of horizontal transfers in prokaryotes using genomic signature. Nucleic Acids Res 33:e6

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  60. Azad RK, Lawrence JG (2005) Use of artificial genomes in assessing methods for atypical gene detection. PLoS Comp Biol 1:461–473

    Article  CAS  Google Scholar 

  61. Tsirigos A, Rigoutsos I (2005) A new computational method for the detection of horizontal gene transfer events. Nucleic Acids Res 33:922–933

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Ragan MA, Harlow TJ, Beiko RG (2006) Do different surrogate methods detect lateral genetic transfer events of different relative ages? Trends Microbiol 14:4–8

    Article  CAS  PubMed  Google Scholar 

  63. Beiko RG, Hamilton N (2006) Phylogenetic identification of lateral genetic transfer events. BMC Evol Biol 6:15

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  64. Sjöstrand J, Tofigh A, Daubin V et al (2014) A Bayesian method for analyzing lateral gene transfer. Syst Biol 63:409–420

    Article  PubMed  CAS  Google Scholar 

  65. Fitch WM (1986) An estimation of the number of invariable sites is necessary for the accurate estimation of the number of nucleotide substitutions since a common ancestor. Prog Clin Biol Res 218:149–159

    CAS  PubMed  Google Scholar 

  66. Lockhart PJ, Larkum AWD, Steel MA et al (1996) Evolution of chlorophyll and bacteriochlorophyll: the problem of invariant sites in sequence analysis. Proc Natl Acad Sci U S A 93:1930–1934

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Yang Z (1996) Among-site rate variation and its impact on phylogenetic analysis. Trends Ecol Evol 11:367–372

    Article  CAS  PubMed  Google Scholar 

  68. Waddell PJ, Steel MA (1997) General time reversible distances with unequal rates across sites: mixing G and inverse Gaussian distributions with invariant sites. Mol Phylogenet Evol 8:398–414

    Article  CAS  PubMed  Google Scholar 

  69. Gowri-Shankar V, Rattray M (2006) Compositional heterogeneity across sites: effects on phylogenetic inference and modelling the correlations between base frequencies and substitution rate. Mol Biol Evol 23:352–364

    Article  CAS  PubMed  Google Scholar 

  70. Schöniger M, von Haeseler A (1994) A stochastic model for the evolution of autocorrelated DNA sequences. Mol Phylogenet Evol 3:240–247

    Article  PubMed  Google Scholar 

  71. Tillier ERM (1994) Maximum likelihood with multiparameter models of substitution. J Mol Evol 39:409–417

    Article  CAS  Google Scholar 

  72. Hein J, Støvlbœk J (1995) A maximum-likelihood approach to analyzing nonoverlapping and overlapping reading frames. J Mol Evol 40:181–190

    Article  CAS  PubMed  Google Scholar 

  73. Muse SV (1995) Evolutionary analyses of DNA sequences subject to constraints on secondary structure. Genetics 139:1429–1439

    CAS  PubMed  PubMed Central  Google Scholar 

  74. Rzhetsky A (1995) Estimating substitution rates in ribosomal RNA genes. Genetics 141:771–783

    CAS  PubMed  PubMed Central  Google Scholar 

  75. Tillier ERM, Collins RA (1995) Neighbor joining and maximum likelihood with RNA sequences: addressing the interdependence of sites. Mol Biol Evol 12:7–15

    Article  CAS  Google Scholar 

  76. Pedersen A-MK, Wiuf C, Christiansen FB (1998) A codon-based model designed to describe lentiviral evolution. Mol Biol Evol 15:1069–1081

    Article  CAS  PubMed  Google Scholar 

  77. Tillier ERM, Collins RA (1998) High apparent rate of simultaneous compensatory base-pair substitutions in ribosomal RNA. Genetics 148:1993–2002

    CAS  PubMed  PubMed Central  Google Scholar 

  78. Higgs PG (2000) RNA secondary structure: physical and computational aspects. Q Rev Biophys 30:199–253

    Article  Google Scholar 

  79. Pedersen A-MK, Jensen JL (2001) A dependent-rates model and an MCMC-based methodology for the maximum-likelihood analysis of sequences with overlapping frames. Mol Biol Evol 18:763–776

    Article  CAS  PubMed  Google Scholar 

  80. Savill NJ, Hoyle DC, Higgs PG (2001) RNA sequence evolution with secondary structure constraints: comparison of substitution rate models using maximum-likelihood methods. Genetics 157:399–411

    Google Scholar 

  81. Jow H, Hudelot C, Rattray M et al (2002) Bayesian phylogenerics using an RNA substitution model applied to early mammalian evolution. Mol Biol Evol 19:1591–1601

    Article  CAS  PubMed  Google Scholar 

  82. Lockhart PJ, Steel MA, Barbrook AC et al (1998) A covariotide model explains apparent phylogenetic structure of oxygenic photosynthetic lineages. Mol Biol Evol 15:1183–1188

    Article  CAS  PubMed  Google Scholar 

  83. Galtier N (2001) Maximum-likelihood phylogenetic analysis under a covarion-like model. Mol Biol Evol 18:866–873

    Article  CAS  PubMed  Google Scholar 

  84. Pupko T, Galtier N (2002) A covarion-based method for detecting molecular adaptation: application to the evolution of primate mitochondrial genomes. Proc R Soc B 269:1313–1316

    Google Scholar 

  85. Susko E, Inagaki Y, Field C et al (2002) Testing for differences in rates-across-sites distributions in phylogenetic subtrees. Mol Biol Evol 19:1514–1523

    Article  CAS  PubMed  Google Scholar 

  86. Wang HC, Spencer M, Susko E et al (2007) Testing for covarion-like evolution in protein sequences. Mol Biol Evol 24:294–305

    Article  CAS  PubMed  Google Scholar 

  87. Wang HC, Susko E, Spencer M et al (2008) Topological estimation biases with covarion evolution. J Mol Evol 66:50–60

    Article  PubMed  CAS  Google Scholar 

  88. Wu JH, Susko E (2009) General heterotachy and distance method adjustments. Mol Biol Evol 26:2689–2697

    Article  CAS  PubMed  Google Scholar 

  89. Wang HC, Susko E, Roger AJ (2009) PROCOV: maximum likelihood estimation of protein phylogeny under covarion models and site-specific covarion pattern analysis. BMC Evol Biol 9:225

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  90. Wang HC, Susko E, Roger AJ (2011) Fast statistical tests for detecting heterotachy in protein evolution. Mol Biol Evol 28:2305–2315

    Article  CAS  PubMed  Google Scholar 

  91. Wu JH, Susko E (2011) A test for heterotachy using multiple pairs of sequences. Mol Biol Evol 28:1661–1673

    Article  CAS  PubMed  Google Scholar 

  92. Kolmogoroff A (1936) Zur theorie der Markoffschen ketten. Math Annal 112:155–160

    Article  Google Scholar 

  93. Yang Z (2014) Molecular evolution: a statistical approach. Oxford University Press, Oxford

    Book  Google Scholar 

  94. Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism. Academic, New York, pp 21–132

    Chapter  Google Scholar 

  95. Lanave C, Preparata G, Saccone C et al (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20:86–93

    Article  CAS  PubMed  Google Scholar 

  96. Naylor GPJ, Brown WM (1998) Amphioxus mitochondrial DNA, chordate phylogeny, and the limits of inference based on comparisons of sequences. Syst Biol 47:61–76

    Article  CAS  PubMed  Google Scholar 

  97. Grundy WN, Naylor GJP (1999) Phylogenetic inference from conserved sites alignments. J Exp Zool 285:128–139

    Article  CAS  PubMed  Google Scholar 

  98. Li CH, Matthes-Rosana KA, Garcia M et al (2012) Phylogenetics of Chondrichthyes and the problem of rooting phylogenies with distant outgroups. Mol Phylogenet Evol 63:365–373

    Article  PubMed  Google Scholar 

  99. Campbell MA, Chen WJ, Lopez JA (2013) Are flatfishes (Pleuronectiformes) monophyletic? Mol Phylogenet Evol 69:664–673

    Article  PubMed  PubMed Central  Google Scholar 

  100. Ho SYW, Jermiin LS (2004) Tracing the decay of the historical signal in biological sequence data. Syst Biol 53:623–637

    Article  PubMed  Google Scholar 

  101. Lartillot N, Philippe H (2004) A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol 21:1095–1109

    Article  CAS  PubMed  Google Scholar 

  102. Le SQ, Dang CC, Gascuel O (2012) Modeling protein evolution with several amino acid replacement matrices depending on site rates. Mol Biol Evol 29:2921–2936

    Article  CAS  PubMed  Google Scholar 

  103. Lartillot N, Rodrigue N, Stubbs D et al (2013) PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Syst Biol 62:611–615

    Article  CAS  PubMed  Google Scholar 

  104. Nguyen L-T, Schmidt HA, Von Haeseler A et al (2015) IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32:268–274

    Article  CAS  PubMed  Google Scholar 

  105. Jermiin LS, Ho SYW, Ababneh F et al (2004) The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. Syst Biol 53:638–643

    Article  PubMed  Google Scholar 

  106. Ababneh F, Jermiin LS, Ma C et al (2006) Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics 22:1225–1231

    Article  CAS  PubMed  Google Scholar 

  107. Ho JWK, Adams CE, Lew JB et al (2006) SeqVis: visualization of compositional heterogeneity in large alignments of nucleotides. Bioinformatics 22:2162–2163

    Article  CAS  PubMed  Google Scholar 

  108. Lanave C, Pesole G (1993) Stationary MARKOV processes in the evolution of biological macromolecules. Binary 5:191–195

    Google Scholar 

  109. Rzhetsky A, Nei M (1995) Tests of applicability of several substitution models for DNA sequence data. Mol Biol Evol 12:131–151

    Article  CAS  PubMed  Google Scholar 

  110. Waddell PJ, Cao Y, Hauf J et al (1999) Using novel phylogenetic methods to evaluate mammalian mtDNA, including amino acid-invariant sites-LogDet plus site stripping, to detect internal conflicts in the data, with special reference to the positions of hedgehog, armadillo, and elephant. Syst Biol 48:31–53

    Article  CAS  PubMed  Google Scholar 

  111. Bowker AH (1948) A test for symmetry in contingency tables. J Am Stat Assoc 43:572–574

    Article  CAS  PubMed  Google Scholar 

  112. Stuart A (1955) A test for homogeneity of the marginal distributions in a two-way classification. Biometrika 42:412–416

    Article  Google Scholar 

  113. Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70

    Google Scholar 

  114. Cannings C, Edwards AWF (1968) Natural selection and the de Finetti diagram. Ann Hum Genet 31:421–428

    Article  CAS  PubMed  Google Scholar 

  115. Bourlat SJ, Juliusdottir T, Lowe CJ et al (2006) Deuterostome phylogeny reveals monophyletic chordates and the new phylum Xenoturbellida. Nature 444:85–88

    Article  CAS  PubMed  Google Scholar 

  116. Fitch WM, Margoliash E (1967) Construction of phylogenetic trees. Science 155:279–284

    Article  CAS  PubMed  Google Scholar 

  117. Cavalli-Sforza LL, Edwards AWF (1967) Phylogenetic analysis: models and estimation procedures. Am J Hum Genet 19:233–257

    CAS  PubMed  PubMed Central  Google Scholar 

  118. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425

    CAS  PubMed  Google Scholar 

  119. Gascuel O (1997) BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol 14:685–695

    Article  CAS  PubMed  Google Scholar 

  120. Zou L, Susko E, Field C et al (2011) The parameters of the Barry-Hartigan model are statistically non identifiable. Syst Biol 60:872–875

    Article  PubMed  Google Scholar 

  121. Minin VN, Suchard MA (2008) Fast, accurate and simulation-free stochastic mapping. Philos Trans R Soc Lond B 363:3985–3995

    Article  Google Scholar 

  122. Huelsenbeck JP, Rannala B (1997) Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science 276:227–232

    Article  CAS  PubMed  Google Scholar 

  123. Whelan S, Goldman N (1999) Distributions of statistics used for the comparison of models of sequence evolution in phylogenetics. Mol Biol Evol 16:11292–11299

    Article  Google Scholar 

  124. Goldman N, Whelan S (2000) Statistical tests of gamma-distributed rate heterogeneity in models of sequence evolution in phylogenetics. Mol Biol Evol 17:975–978

    Article  CAS  PubMed  Google Scholar 

  125. Goldman N (1993) Statistical tests of models of DNA substitution. J Mol Evol 36:182–198

    Article  CAS  PubMed  Google Scholar 

  126. Telford MJ, Wise MJ, Gowri-Shankar V (2005) Consideration of RNA secondary structure significantly improves likelihood-based estimates of phylogeny: examples from the bilateria. Mol Biol Evol 22:1129–1136

    Article  CAS  PubMed  Google Scholar 

  127. Goldman N, Yang Z (1994) A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol 11:725–736

    CAS  PubMed  Google Scholar 

  128. Muse SV, Gaut BS (1994) A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol 11:715–724

    CAS  PubMed  Google Scholar 

  129. Dayhoff MO, Schwartz RM, Orcutt BC (eds) (1978) A model of evolutionary change in proteins. National Biomedical Research Foundation, National Biomedical Research Foundation, Washington, DC

    Google Scholar 

  130. Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. CABIOS 8:275–282

    CAS  PubMed  Google Scholar 

  131. Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89:10915–10919

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  132. Adachi J, Hasegawa M (1996) Model of amino acid substitution in proteins encoded by mitochondrial DNA. J Mol Evol 42:459–468

    Article  CAS  PubMed  Google Scholar 

  133. Cao Y, Janke A, Waddell PJ et al (1998) Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders. J Mol Evol 47:307–322

    Article  CAS  PubMed  Google Scholar 

  134. Yang Z, Nielsen R, Hasegawa M (1998) Models of amino acid substitution and applications to mitochondrial protein evolution. Mol Biol Evol 15:1600–1611

    Article  CAS  PubMed  Google Scholar 

  135. Müller T, Vingron M (2000) Modeling amino acid replacement. J Comp Biol 7:761–776

    Article  Google Scholar 

  136. Adachi J, Waddell PJ, Martin W et al (2000) Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA. J Mol Evol 50:348–358

    Article  CAS  PubMed  Google Scholar 

  137. Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum likelihood approach. Mol Biol Evol 18:691–699

    Article  CAS  PubMed  Google Scholar 

  138. Dimmic MW, Rest JS, Mindell DP et al (2002) RtREV: an amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. J Mol Evol 55:65–73

    Article  CAS  PubMed  Google Scholar 

  139. Abascal F, Posada D, Zardoya R (2007) MtArt: a new model of amino acid replacement for Arthropoda. Mol Biol Evol 24:1–5

    Article  CAS  PubMed  Google Scholar 

  140. Le SQ, Gascuel O (2008) An improved general amino acid replacement matrix. Mol Biol Evol 25:1307–1320

    Article  CAS  PubMed  Google Scholar 

  141. Shapiro B, Rambaut A, Drummond AJ (2005) Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Mol Biol Evol 23:7–9

    Article  PubMed  CAS  Google Scholar 

  142. Hyman IT, Ho SYW, Jermiin LS (2007) Molecular phylogeny of Australian Helicarionidae, Microcystidae and related groups (Gastropoda: Pulmonata: Stylommatophora) based on mitochondrial DNA. Mol Phylogenet Evol 45:792–812

    Article  CAS  PubMed  Google Scholar 

  143. Hudelot C, Gowri-Shankar V, Jow H et al (2003) RNA-based phylogenetic methods: application to mammalian mitochondrial RNA sequences. Mol Phylogenet Evol 28:241–252

    Article  CAS  PubMed  Google Scholar 

  144. Murray S, Flø Jørgensen M, Ho SYW et al (2005) Improving the analysis of dinoflagelate phylogeny based on rDNA. Protist 156:269–286

    Article  CAS  PubMed  Google Scholar 

  145. Posada D, Crandall KA (1998) MODELTEST: testing the model of DNA substitution. Bioinformatics 14:817–818

    Article  CAS  PubMed  Google Scholar 

  146. Abascal F, Zardoya R, Posada D (2005) ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21:2104–2105

    Article  CAS  PubMed  Google Scholar 

  147. Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information-theoretic approach. Springer, New York

    Google Scholar 

  148. Posada D, Buckley TR (2004) Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. Syst Biol 53:793–808

    Article  PubMed  Google Scholar 

  149. Akaike H (1974) A new look at the statistical model identification. IEEE Trans Auto Cont 19:716–723

    Article  Google Scholar 

  150. Sugiura N (1978) Further analysis of the data by Akaike’s information criterion and the finite corrections. Comm Stat A Theor Meth 7:13–26

    Article  Google Scholar 

  151. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  Google Scholar 

  152. Suchard MA, Weiss RE, Sinsheimer JS (2001) Bayesian selection of continuous-time Markov chain evolutionary models. Mol Biol Evol 18:1001–1013

    Article  CAS  PubMed  Google Scholar 

  153. Aris-Brosou S, Yang Z (2002) Effects of models of rate evolution on estimation of divergence dates with special reference to the metazoan 18S ribosomal RNA phylogeny. Syst Biol 51:703–714

    Article  PubMed  Google Scholar 

  154. Nylander JA, Ronquist F, Huelsenbeck JP et al (2004) Bayesian phylogenetic analysis of combined data. Syst Biol 53:47–67

    Article  PubMed  Google Scholar 

  155. Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795

    Article  Google Scholar 

  156. Raftery AE (1996) Hypothesis testing and model selection. In: Gilks WR, Richardson S, Spiegelhalter DJ (eds) Markov chain Monte Carlo in practice. Chapman & Hall, London, pp 163–167

    Google Scholar 

  157. Minin V, Abdo Z, Joyce P et al (2003) Performance-based selection of likelihood models for phylogenetic estimation. Syst Biol 52:674–683

    Article  PubMed  Google Scholar 

  158. Posada D, Crandall KA (2001) Selecting methods of nucleotide substitution: An application to human immunedeficiency virus 1 (HIV-1). Mol Biol Evol 18:897–906

    Article  CAS  PubMed  Google Scholar 

  159. Posada D (2008) jModelTest: phylogenetic model averaging. Mol Biol Evol 25:1253–1256

    Article  CAS  PubMed  Google Scholar 

  160. Yang Z (2006) Computational molecular evolution. Oxford University Press, Oxford

    Book  Google Scholar 

  161. Yang Z, Kumar S, Nei M (1995) A new method of inference of ancestral nucleotide and amino acid sequences. Genetics 141:1641–1650

    CAS  PubMed  PubMed Central  Google Scholar 

  162. Susko E, Field C, Blouin C et al (2003) Estimation of rates-across-sites distributions in phylogenetic substitution models. Syst Biol 52:594–603

    Article  PubMed  Google Scholar 

  163. Soubrier J, Steel M, Lee MSY et al (2012) The influence of rate heterogeneity among sites on the time dependence of molecular rates. Mol Biol Evol 29:3345–3358

    Article  CAS  PubMed  Google Scholar 

  164. Cox DR (1962) Further results on tests of separate families of hypotheses. J R Stat Soc B 24:406–424

    Google Scholar 

  165. Rambaut A, Grassly NC (1997) Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. CABIOS 13:235–238

    CAS  PubMed  Google Scholar 

  166. Fletcher W, Yang ZH (2009) INDELible: a flexible simulator of biological sequence evolution. Mol Biol Evol 26:1879–1888

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  167. Jermiin LS, Ho SYW, Ababneh F et al (2003) Hetero: a program to simulate the evolution of DNA on a four-taxon tree. Appl Bioinformatics 2:159–163

    CAS  PubMed  Google Scholar 

  168. Felsenstein J (2004) Inferring phylogenies. Sinauer Associates, Sunderland, MA

    Google Scholar 

  169. Rokas A, Krüger D, Carroll SB (2005) Animal evolution and the molecular signature of radiations compressed in time. Science 310:1933–1938

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lars S. Jermiin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this protocol

Cite this protocol

Jermiin, L.S., Jayaswal, V., Ababneh, F.M., Robinson, J. (2017). Identifying Optimal Models of Evolution. In: Keith, J. (eds) Bioinformatics. Methods in Molecular Biology, vol 1525. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6622-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6622-6_15

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6620-2

  • Online ISBN: 978-1-4939-6622-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics