TY - JOUR
T1 - Is N-Hacking Ever OK? A simulation-based study
JF - bioRxiv
DO - 10.1101/2019.12.12.868489
SP - 2019.12.12.868489
AU - Reinagel, Pamela
Y1 - 2019/01/01
UR - http://biorxiv.org/content/early/2019/12/16/2019.12.12.868489.abstract
N2 - After an experiment has been completed and analyzed, a trend may be observed that is “not quite significant”. Sometimes in this situation, researchers incrementally grow their sample size N in an effort to achieve statistical significance. This is especially tempting in situations when samples are very costly or time-consuming to collect, such that collecting an entirely new sample larger than N (the statistically sanctioned alternative) would be prohibitive. Such post-hoc sampling or “N-hacking” is condemned, however, because it leads to an excess of false positive results. Here Monte-Carlo simulations are used to show why and how incremental sampling causes false positives, but also to challenge the claim that it necessarily produces alarmingly high false positive rates. In a parameter regime that would be representative of practice in many research fields, simulations show that the inflation of the false positive rate is modest and easily bounded. But the effect on false positive rate is only half the story. What many researchers really want to know is the effect N-hacking would have on the likelihood that a positive result is a real effect that will be replicable. This question has not been considered in the reproducibility literature. The answer depends on the effect size and the prior probability of an effect. Although in practice these values are not known, simulations show that for a wide range of values, the positive predictive value (PPV) of results obtained by N-hacking is in fact higher than that of non-incremented experiments of the same sample size and statistical power. This is because the increase in false positives is more than offset by the increase in true positives. Therefore in many situations, adding a few samples to shore up a nearly-significant result is in fact statistically beneficial. It is true that uncorrected N-hacking elevates false positives, but in some common situations this does not reduce PPV, which has not been shown previously. In conclusion, if samples are added after an initial hypothesis test this should be disclosed, and if a false positive rate is stated it should be corrected. But, contrary to widespread belief, collecting additional samples to resolve a borderline P value is not invalid, and can confer previously unappreciated advantages for efficiency and positive predictive value.
ER -