Abstract
Reproducibility, the ability to replicate analytical findings, is a prerequisite for both scientific discovery and clinical utility. Troublingly, we are in the midst of a reproducibility crisis, in which many investigations fail to replicate. Although many believe that these failings are due to misunderstanding or misapplication of statistical inference (e.g., p-values or the dichotomization of “statistically significant”), we believe the shortcomings arise much earlier in the data science workflow, at the level of measurement, including data acquisition and reconstruction. A key to reproducibility is that multiple measurements of the same item (e.g., experimental sample or clinical participant) are similar to one another, while they are dissimilar from other items. The intra-class correlation coefficient (ICC) quantifies reproducibility in this way, but only for univariate (one dimensional) data, while relying on Gaussian assumptions for validity. In contrast, big data is multivariate (high-dimensional), non-Gaussian, and often non-Euclidean (including text, images, speech, and networks), rendering ICC inadequate. We propose a novel statistic, discriminability, which quantifies the degree to which individual samples are similar to one another, without restricting the data to be univariate, Gaussian, or even Euclidean. We then introduce the possibility of optimizing experimental design via increasing discriminability. We prove that optimizing discriminability yields an improved ability to use the data for subsequent inference tasks, without specifying the inference task a priori. We then apply this approach three different datasets: a brain imaging dataset built by the “Consortium for Reliability and reproducibility” which consists of 28 disparate magnetic resonance imaging datasets, and two genomics datasets. Discriminability is the only statistic that, by optimizing according to it, improves performance on all subsequent inference tasks for each dataset, despite that they were not considered in the optimization. We therefore suggest that designing experiments and analyses to optimize discriminability may be a crucial step in solving the reproducibility crisis.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Added genomics experiments, renamed several features of discriminability framework (one and two-sample testing to GOF and comparison tests), added simulation cases, added several comparison algorithms (FPI, HSIC, DISCO).