Genome-scale coestimation of species and gene trees

  1. Vincent Daubin1
  1. 1Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne F-69622, France;
  2. 2Department of Integrative Biology, UC Berkeley, Berkeley, California 94720-3140, USA;
  3. 3INRIA Rhône-Alpes, Montbonnot F-38322, France

    Abstract

    Comparisons of gene trees and species trees are key to understanding major processes of genome evolution such as gene duplication and loss. Because current methods to reconstruct phylogenies fail to model the two-way dependency between gene trees and the species tree, they often misrepresent gene and species histories. We present a new probabilistic model to jointly infer rooted species and gene trees for dozens of genomes and thousands of gene families. We use simulations to show that this method accurately infers the species tree and gene trees, is robust to misspecification of the models of sequence and gene family evolution, and provides a precise historic record of gene duplications and losses throughout genome evolution. We simultaneously reconstruct the history of mammalian species and their genes based on 36 completely sequenced genomes, and use the reconstructed gene trees to infer the gene content and organization of ancestral mammalian genomes. We show that our method yields a more accurate picture of ancestral genomes than the trees available in the authoritative database Ensembl.

    Footnotes

    • 4 Corresponding author

      E-mail bastien.boussau{at}univ-lyon1.fr

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.141978.112.

      Freely available online through the Genome Research Open Access option.

    • Received April 19, 2012.
    • Accepted October 22, 2012.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported License), as described at http://creativecommons.org/licenses/by-nc/3.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server