Abstract
We present a phylogenetic approach rooted in the field of population genetics that more realistically models the evolution of protein-coding DNA under the assumption of stabilizing selection for a gene specific, optimal amino acid sequence. In addition to being consistent with the fundamental principles of population genetics, our new set of models, which we collectively call SelAC, fit phylogenetic data astronomically better than popular models, suggesting strong potential for more accurate inference of phylogenetic trees and branch lengths. SelAC also demonstrates that a large amount of biologically meaningful information is accessible when using a nested set of mechanistic models. For example, for each position SelAC provides a probabilistic estimate of any given amino acid being optimal. Because SelAC assumes the strength of selection is proportional to the expression level of a gene, SelAC provides gene specific estimates of protein synthesis rates. Finally, because SelAC’s is a nested approach based on clearly stated biological assumptions, it can be expanded or simplified as needed.