Statistical Evidence in Experimental Psychology: An Empirical Comparison Using 855 t Tests

Ruud Wetzels; Dora Matzke; Michael D Lee; Jeffrey N Rouder; Geoffrey J Iverson; Eric-Jan Wagenmakers

doi:10.1177/1745691611406923

Statistical Evidence in Experimental Psychology: An Empirical Comparison Using 855 t Tests

Perspect Psychol Sci. 2011 May;6(3):291-8. doi: 10.1177/1745691611406923.

Authors

Ruud Wetzels¹, Dora Matzke², Michael D Lee³, Jeffrey N Rouder⁴, Geoffrey J Iverson³, Eric-Jan Wagenmakers²

Affiliations

¹ Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands wetzels.ruud@gmail.com.
² Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands.
³ Department of Cognitive Sciences, University of California, Irvine.
⁴ Department of Psychological Sciences, University of Missouri-Columbia.

PMID: 26168519
DOI: 10.1177/1745691611406923

Abstract

Statistical inference in psychology has traditionally relied heavily on p-value significance testing. This approach to drawing conclusions from data, however, has been widely criticized, and two types of remedies have been advocated. The first proposal is to supplement p values with complementary measures of evidence, such as effect sizes. The second is to replace inference with Bayesian measures of evidence, such as the Bayes factor. The authors provide a practical comparison of p values, effect sizes, and default Bayes factors as measures of statistical evidence, using 855 recently published t tests in psychology. The comparison yields two main results. First, although p values and default Bayes factors almost always agree about what hypothesis is better supported by the data, the measures often disagree about the strength of this support; for 70% of the data sets for which the p value falls between .01 and .05, the default Bayes factor indicates that the evidence is only anecdotal. Second, effect sizes can provide additional evidence to p values and default Bayes factors. The authors conclude that the Bayesian approach is comparatively prudent, preventing researchers from overestimating the evidence in favor of an effect.

Keywords: t test; p value; Bayes factor; effect size; hypothesis testing.

Publication types

Review