TY - JOUR T1 - A comparison of one-rate and two-rate inference frameworks for site-specific <em>dN</em>/<em>dS</em> estimation JF - bioRxiv DO - 10.1101/032805 SP - 032805 AU - Stephanie J. Spielman AU - Suyang Wan AU - Claus O. Wilke Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/04/22/032805.abstract N2 - Two broad paradigms exist for inferring dN/dS, the ratio of nonsynonymous to synonymous substitution rates, from coding sequences: i) a one-rate approach, where dN/dS is represented with a single parameter, or ii) a two-rate approach, where dN and dS are estimated separately. These paradigms have been well-studied for positive-selection (dN/dS &gt; 1) inference. By contrast, their relative merits for the specific purpose of dN/dS point estimation at individual sites remain largely untested. Here, we use sequence simulation to systematically compare how accurately each paradigm infers site-specific dN/dS ratios. In particular, we simulate alignments with mutation–selection models rather than with dN/dS-based models, thus addressing the reliability of dN/dS estimation when the simulation and inference model differ, i.e. when the inference model is mathematically misspecified. We find that one-rate frameworks universally infer more accurate dN/dS values. Surprisingly, we recover this result even when dS varies among sites. Therefore, even when extensive dS variation exists, modeling this variation substantially reduces accuracy. We attribute this finding to the increased statistical challenge of estimating dS relative to dN, which in turn is a natural result of the structure of the genetic code. A randomly chosen mutation is more likely going to result in a nonsynonymous than a synonymous change, and thus sequences are more informative for dN than for dS estimation. We additionally find that high levels of divergence among sequences, rather than the number of sequences in the alignment, are more critical for obtaining precise point estimates. ER -