PlotsOfDifferences – a web app for the quantitative comparison of unpaired data

The quantitative comparison of data acquired under different conditions is an important aspect of experimental science. The most widely used statistic for quantitative comparisons is the p-value. However, p-values suffer from several shortcomings. The most prominent shortcoming that is relevant for quantitative comparisons is that p-values fail to convey the magnitude of differences. The differences between conditions are best quantified by the determination of effect size. To democratize the calculation of effect size, we have developed a web-based tool. The tool uses bootstrapping to resample mean or median values for each of the conditions and these values are used to calculate the effect size and their compatibility interval. The web tool generates a graphical output, showing the bootstrap distribution of the difference next to the actual data for optimal interpretation. A tabular output with statistics and effect sizes is also generated and the table can be supplemented with p-values that are calculated with a randomization test. The app that we report here is dubbed PlotsOfDifferences and is available at: https://huygens.science.uva.nl/PlotsOfDifferences


Data input and structure
The data can be supplied in different ways, similar to the input for PlotsOfData (Postma and Goedhart, 2018). Both wide and tidy (Wickham, 2014) data structures are accepted. The wide format is used as a default, but it can be changed to tidy by using an alternative hyperlink: https://huygens.science.uva.nl/PlotsOfDifferences/?data=4;T Plotting the data A detailed description of the options for plotting the data is reported in the paper on PlotsOfData (Postma and Goedhart, 2018).

Plotting the effect size
A bootstrap method is used to calculate the effect size and the compatibility interval (CI). To this end, each condition is resampled 1000x with replacement to calculate a distribution of mean or median values. A reference condition is selected by the user and the difference between the collection of boostrapped median or mean values is calculated, resulting in a new distribution of differences. This distribution is plotted. The 2.5 th and 97.5 th percentile of the distribution are used to determine a 95% confidence interval around the difference. We will indicate this interval as the compatibility interval (CI), i.e. the range of values most compatible with the data. The CI is displayed as black line under the distribution. The effect size plot is either displayed next to the data plot (landscape orientation) or below the data plot (portrait orientation). Two file formats are available for downloading the figure, PDF and PNG. The PNG format is lossless can be readily converted to other bitmap-type formats that are suitable for presentation or incorporation into (multi-panel) figures. The PDF format is vectorbased and can be imported into any software package that handles vector-based graphics for further adjustment of the lay-out.

Randomization test
To calculate a p-value of the different conditions relative to the reference, a randomization test is used. Briefly, the data of a condition and the reference are combined (since the implication of the null-hypothesis is that the data are sampled from the same population). The combined data are resampled without replacement and a new (null-) distribution of the difference between means or medians is obtained. To derive a p-value for a two-tailed test, the absolute difference between means or medians is compared with the absolute differences that compose the nulldistribution.

Table with Statistics
A customizable table with statistics is generated, as described previously (Postma and Goedhart, 2018). In addition, a table with effect sizes and their CI is generated. The tables can be exported in different formats (CSV, XLS and PDF).

Application
To illustrate the output of the web-tool, we used data from cell area measurements. The area of cells was measured under a reference, unperturbed condition and two conditions where a Guanine Exchange Factor (GEF) was overexpressed. Plots of the data show that the median values of the perturbed conditions differ from the reference (indicated as 'Control'). Quantification of the difference between the reference and the other two conditions is shown in the right panel of the figure. The table with values is also presented. The median area of the 'TIAM' condition is larger by 215 µm 2 and the area of the 'LARG' condition is smaller by 196 µm 2 . The distribution of differences from the bootstrap procedure is shown and the compatibility interval (CI) that is based on the distribution is indicated with the horizontal black bar. The CI of the effect of TIAM [40 µm 2 -454 µm 2 ] is the range of values that is most compatible with the data. The CI of the effect of LARG [-349 µm 2 -53 µm 2 ] does include zero (suggesting no effect) but any effect size in the compatibility interval is conceivable. Therefore, it cannot be concluded that LARG has an effect on area size and it cannot be excluded as well. The effect of LARG on cell area can only be evaluated after acquiring more data, thereby decreasing the uncertainty of the effect size. The p-values from the randomization test are listed in the table. The p-value for the difference between medians for the TIAM condition is 0.026 and that for LARG is 0.015. Both indicate that the observed difference is unlikely given the hypothesis that the samples (condition and perturbation) were acquired from the same population. For TIAM the p-value is compatible with the effect size, while the p-value for LARG seems on the low side, given that the CI of the effect size includes 0. This stresses the careful evaluation of the outcome of statistical tests in relation to the actual data that is acquired. Figure 2: Example tabular output of PlotsOfDifferences for the analysis of difference between median cell area. The difference is determined relative to the 'Control' condition.

Conclusion
A shiny based webtool that uses R without the need for coding skills was generated to democratize quantitative data comparison by calculating absolute differences. The calculated difference is an absolute effect size that is a good alternative (or supplement) for null-hypothesis significance tests and p-values (Halsey et al., 2015;Wasserstein and Lazar, 2016;Claridge-Chang and Assam, 2016;Drummond and Tom, 2011).
A limit of the bootstrap approach is that will not be valid for small sample size (n<10). The premise of bootstrapping is that the sample reflects the population. Obviously, it is difficult to ensure this for low n and the data will better reflect the population for high n. We propose a cut-off at a sample size of 10. The user will receive a warning for n<10 but the webtool will still calculate the effect size and p-value for educational purposes. A feature of the random resampling that is used for calculation of the effect size and the randomization test is that the results will slightly vary between repeated calculations on the same data. To conclude, we anticipate that the high-quality plots created with PlotsOfDifferences will facilitate quantitative comparisons and improve transparent communication of scientific data which will be beneficial for both researchers and their audience.