PT - JOURNAL ARTICLE AU - Pamela Reinagel TI - Is N-Hacking Ever OK? A simulation-based study AID - 10.1101/2019.12.12.868489 DP - 2019 Jan 01 TA - bioRxiv PG - 2019.12.12.868489 4099 - http://biorxiv.org/content/early/2019/12/27/2019.12.12.868489.short 4100 - http://biorxiv.org/content/early/2019/12/27/2019.12.12.868489.full AB - After an experiment has been completed and analyzed, a trend may be observed that is “not quite significant”. Sometimes in this situation, researchers incrementally grow their sample size N in an effort to achieve statistical significance. This is especially tempting in situations when samples are very costly or time-consuming to collect, such that collecting an entirely new sample larger than N (the statistically sanctioned alternative) would be prohibitive. Such post-hoc sampling or “N-hacking” is condemned, however, because it leads to an excess of false positive results. Here Monte-Carlo simulations are used to show why and how incremental sampling causes false positives, but also to challenge the claim that it necessarily produces alarmingly high false positive rates. In a parameter regime that would be representative of practice in many research fields, simulations show that the inflation of the false positive rate is modest and easily bounded. But the effect on false positive rate is only half the story. What many researchers really want to know is the effect N-hacking would have on the likelihood that a positive result is a real effect that will be replicable: the positive predictive value (PPV). This question has not been considered in the reproducibility literature. The answer depends on the effect size and the prior probability of an effect. Although in practice these values are not known, simulations show that for a wide range of values, the PPV of results obtained by N-hacking is in fact higher than that of non-incremented experiments of the same sample size and statistical power. This is because the increase in false positives is more than offset by the increase in true positives. Therefore in many situations, adding a few samples to shore up a nearly-significant result is in fact statistically beneficial. In conclusion, if samples are added after an initial hypothesis test this should be disclosed, and if a p value is reported it should be corrected. But, contrary to widespread belief, collecting additional samples to resolve a borderline p value is not invalid, and can confer previously unappreciated advantages for efficiency and positive predictive value.