Abstract
Current attempts at methodological reform in sciences come in response to an overall lack of rigor in methodological and scientific practices in experimental sciences. However, most methodological reform attempts suffer from similar mistakes and over-generalizations to the ones they aim to address. We argue that this can be attributed in part to lack of formalism and first principles. Considering the costs of allowing false claims to become canonized, we argue for formal statistical rigor and scientific nuance in methodological reform. To attain this rigor and nuance, we propose a five-step formal approach for solving methodological problems. To illustrate the use and benefits of such formalism, we present a formal statistical analysis of three popular claims in the metascientific literature: (a) that reproducibility is the cornerstone of science; (b) that data must not be used twice in any analysis; and (c) that exploratory projects imply poor statistical practice. We show how our formal approach can inform and shape debates about such methodological claims.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Formal approach outlined and emphasized throughout the text to clarify the focus. New references added and incorporated to provide context. Formal propositions explained.
1 As an indication of impact on scientific literature, we looked up Google Scholar citation counts for some of the key articles from which these claims originate, the oldest of which was published 8 years ago. By the time the current manuscript was last revised, Begley and Ioannidis (2015) had 686; Nosek et al. (2012) had 1045; Nosek and Lakens (2014) had 473; Nosek et al. (2018) had 574; Open Science Collaboration (2012) had 529; Open Science Collaboration (2015) had 4807; Pashler and Wagenmakers (2012) had 1182; Wagenmakers et al. (2012) had 704; and Zwaan et al. (2018) had 244 citations.
2 Here we use reproducibility as in: “the extent to which consistent results are observed when scientific studies are repeated” (Open Science Collaboration, 2012, p.657). In Appendix 1 we provide a technical definition of reproducibility which we use in obtaining our results. We limit our discussion to statistical reproducibility of results only (similar to results reproducibility in Goodman et al., 2016), and exclude other types such as computational or methods reproducibility —whether the materials, methods, procedures, algorithms, analyses used in an original study are reported in a sufficiently detailed and transparent way that enables others to carry it out again.
3 Some exceptions are as follows: Patil et al. (2016) use the overlap in prediction intervals from original and replication studies to define a statistical measure of reproducibility. Gorroochurn et al. (2007) investigates the relationship between reproducibility and p-values and in the context of association between variables. Pauli (2019) develops a Bayesian model to evaluate the results of replication studies and estimate a reproducibility rate. Hedges and Schauer (2019) offer a principled way of evaluating replication studies within a meta-analytic framework. Different from purely statistical approaches, Fanelli (2020) takes a meta-analytic approach to study reproducibility and uses an information theoretical framework to quantify it. We acknowledge and endorse the formal approach undertaken by these articles to address practical problems of evaluating and quantifying the results of replication experiments.
4 An epistemic claim that well-confirmed scientific theories and models capture (approximate) truths about the world is an example of scientific realism. The arguments for and against scientific realism (e.g., positivism) are beyond the scope of this paper. Interested readers may follow up on discussions in the philosophical literature (e.g., Hacking et al., 1983; Chakravartty, 2017).
5 sufficient statistic
6 complete sufficient statistic
7 ancillary statistic
8 Testing hypotheses with no theory to motivate them is a fishing expedition regardless of methodological rigor. See Gervais (2020); Guest and Martin (2020); Fried (2020); MacEachern and Van Zandt (2019); Muthukrishna and Henrich (2019); Oberauer and Lewandowsky (2019); Szollosi and Donkin (2019); Szollosi et al. (2019); van Rooij (2019); van Rooij and Baggio (2020a) and van Rooij and Baggio (2020b) for discussions on scientific theory.
9 Prediction here is not used in statistical sense but refers to “the acquisition of data to test ideas about what will occur” (Nosek et al., 2018, p.2600). To clarify, statistics uses sample quantities (observables) to perform inference on population quantities (unobservables). Inference, therefore, is about unobservables. Statistical prediction, on the other hand, is defined as predicting a yet unobserved value of an observable and therefore, is about observables. The quote refers to a procedure about unobservables and hence “prediction” is not used in a statistical sense. Instead it is used to demarcate the timing of hypothesis setting and analytical planning with regard to data collection or observation. The authors also specifically refer to null hypothesis significance testing procedure as the standard tool for statistical inference referenced in this quote. While the statement itself can be misleading because of these local definitions and assumptions, our aim is to critique the intended meaning not the idiosyncratic use of statistical terminology.
10 While not part of our core argument this particular slogan is underspecified. It is not clear how the argument for the necessity of preregistration for statistically valid inference should be reconciled with the proposed flexibility of preregistrations. In any case, this line of thinking is moot from our perspective since the underlying premise itself does not hold.
11 Abductive inference involves both the process of making inference to the best explanation based on a set of candidate hypotheses (Haig, 2009)