## Abstract

We wish to answer this question If you observe a “significant” *P* value after doing a single unbiased experiment, what is the probability that your result is a false positive?. The weak evidence provided by *P* values between 0.01 and 0.05 is explored by exact calculations of false positive risks.

When you observe *P* = 0.05, the odds in favour of there being a real effect (given by the likelihood ratio) are about 3:1. This is far weaker evidence than the odds of 19 to 1 that might, wrongly, be inferred from the *P* value. And if you want to limit the false positive risk to 5 %, you would have to assume that you were 87% sure that there was a real effect before the experiment was done.

If you observe *P* =0.001 in a well-powered experiment, it gives a likelihood ratio of almost 100:1 odds on there being a real effect. That would usually be regarded as conclusive, But the false positive risk would still be 8% if the prior probability of a real effect were only 0.1. And, in this case, if you wanted to achieve a false positive risk of 5% you would need to observe *P* = 0.00045.

It is recommended that the terms “significant” and “non-significant” should never be used. Rather, *P* values should be supplemented by specifying the prior probability that would be needed to produce a specified (e.g. 5%) false positive risk. It may also be helpful to specify the minimum false positive risk associated with the observed *P* value.

Despite decades of warnings, many areas of science still insist on labelling a result of *P* < 0.05 as “statistically significant”. This practice must contribute to the lack of reproducibility in some areas of science. This is before you get to the many other well-known problems, like multiple comparisons, lack of randomisation and *P*-hacking. Precise inductive inference is impossible and replication is the only way to be sure,

Science is endangered by statistical misunderstanding, and by senior people who impose perverse incentives on scientists.