How many digits in a mean are worth reporting?

Most bioscientists need to report mean values, yet many have little idea of how many digits are significant, and at what point further digits are mere random junk. Thus a recent report that the mean of 17 values was 3.863 with a standard error of the mean (SEM) of 2.162 revealed only that none of the seven authors understood the limitations of their work. The simple rule derived here by experiment for restricting a mean value to its significant digits (sig-digs) is this: the last sig-dig in the mean value is at the same decimal decade as the first sig-dig (the first non-zero) in the SEM. An extended rule for the mean, and a different rule for the SEM itself are also derived. For the example above the reported values should be a mean of 4 with SEM 2.2. Routine application of these simple rules will often show that a result is not as compelling as one had hoped.


Introduction
During the last five years, as referee (reviewer) or editor of 53 biosciences articles, I found that 29 (55 %) reported too many, even ridiculously too many, digits in mean values or SEMs or both. For example, a report of 3.863 ± 2.162 for a sample of 17 for which the sig-digs are really 4 ± 2.2. A scan of 50 articles in a variety of bioscience journals showed that 32 % made this mistake once or many times. It is not the total number of digits, or where the decimal point falls that matters: the critical feature is the relation between the mean and its SEM. Thus the frequency of a transition of a trapped and laser-cooled, lone ion of 88 Sr + was reported 1 , correctly according to the extended rule in Table 3, to 16 significant digits as 444 779 044 095 484.6 Hz, with a SEM of 1.5 Hz. Equally correct though less clear would have been 444 779 044 .095 4846 MHz and SEM 0.000 00015 MHz.
The problem seems to be that there is no published logical analysis of where to stop. Here I derive simple rules by experiment which allow one to restrict a mean and SEM to their sig-digs: that is to those digits that do mean something and are not just random junk.

Experiments and their results
From a population with mean 39.615 and SEM 1.33, 8000 instances were drawn at random by a computer program (8000 is an arbitrary large number). Table 1 below shows the frequency of the ten digits in successive decades. The digit '3'in the 10's decade is clearly meaningful, and so is '9' in the 1's, but in the 0.1's decade, the target digit '6', though the most frequent in its decade, is barely better than random: a mean of '39' is worth reporting, but '39.6' is overdoing things.  Table 2 below derives from the same mean but a SEM 100 times smaller at 0.0133. This supports a mean of 39.61, but in the next '0.00x' decade the target digit, '5', is not even the most frequent in its decade. A simple rule is, thus, to stop the mean at the same decade as that of the first significant (non-zero) digit in the SEM. Note that the rule uses the SEM to show where to stop: it makes no use whatever of the position of the decimal point. In Table 1 the counts in the '0.1's decade are near random, but if we were to decrease the SEM gradually the totals for each digit in a decade would become more and more unequal as peaks emerged and grew from the slowly sinking hummocky plain and, consequently, indicated that we would soon be able to justify another sig-dig. In a report, the number of sig-digs must be integer, but to understand the trends we need a sig-dig index, D M , that is at least semi-continuous. Such an index is derived in the Appendix.
The points in Figure 1 below show how D M depends experimentally on C, the quotient of mean / SEM in experiments similar to those outlined in Tables 1 and 2. The fitted line is D M = log 10 (C). If we take the ceiling of these values -equivalent to truncating and adding 1 -to get an integer value we get the stepped broken line in Figure 1. This translates into the simple rule already given. But at some points, at the back of the steps, this gives values that are only just sufficient while at the front of the step the values are well into the meaningless random zone and almost a digit too many. A more complex extension to the simple rule (Table 3) shifts the steps about half a decade left (log 10 (3) ≈ 0.5) and spreads the overshoot into the meaningless region more evenly: the overshoot is 0.5 to 1.5 random digits. The unbroken line (staircase) in Figure 1 shows the extended rule. The broken line staircase shows the simple rule for integer (1, 2, 3 and so on) sig-digs. The unbroken line staircase shows a better but slightly more complex rule that gives a more uniform distance between the staircase and the continuous line. Both simple and extended rules are shown in Table 3.
Rule 2 for D SEM is simpler but its origin is more complicated. Figure 2 below shows, for a fixed mean and standard deviation (SD), how D SEM depends, in experiments similar to those in Tables 1  and 2, on the number of items, N S , in the calculation of a SEM. Points for two such experiments, with the same mean and different SDs are shown. Over a range of 100 in N S the value of D SEM rises with a slope of 1 on the log-linear scales shown: D SEM = log 10 (N S ) + c. But eventually it falls over a cliff from a first significant digit of '1' to '9' a decade further down (for example from 1.001 to 0.999). The overall slope of this saw-toothed progression (0.5) is half that of the teeth themselves reflecting the fact that the SEM depends on √N S .
The exact position of the saw-tooth depends on the numerical value of the SEM, and to accommodate this the bounding line D SEM = log 10 (N S ) / 2 + 1 is shown. The steps show Rule 2. The offset for N S ≤ 6 accommodates the fact that at small N S the bounding line curves downwards, though this is not shown in detail in Figure 2.  Table 3, taking account of behaviour at small N S .
The full rules for sig-digs in a mean and in a SEM are summarised below in boxed

Discussion
Does it matter if random junk is reported? For those who understand these matters, no. They can adjust the values themselves. For an author's reputation, yes, it does matter. Gross over-reporting of values is one of the clearest warning signs to readers that the author does not understand what he or she is doing.
The big general purpose journals were not interested in this article, yet (I suggest) those who need it most are the least likely to come across it in a specialist education or statistics journal. This version was therefore archived as: Clymo RS (2012) How many digits in a mean are worth reporting? http://arXiv.org/abs/1301.1034. But that location is unlikely to be scanned by bioscientists. 'Duplicate publication' is strongly deprecated in formal journals, for obvious reasons, but is allowed in preprint archives. So this is the second source for this article. The last significant digit in the mean is in the same decade as the first non-zero digit in the SEM; but Rule 1 extended (you may ignore this supplement without serious error): if the first significant digit in C = mean / SEM is '3' to '9' then one more digit is significant in the mean.

Rule 3: for counts as percentages
For fewer than 100 observations the two digits in a percentage overstate the significance.