PT  - JOURNAL ARTICLE
AU  - Ulrich Knief
AU  - Wolfgang Forstmeier
TI  - Violating the normality assumption may be the lesser of two evils
AID  - 10.1101/498931
DP  - 2020 Jan 01
TA  - bioRxiv
PG  - 498931
4099  - http://biorxiv.org/content/early/2020/05/05/498931.short
4100  - http://biorxiv.org/content/early/2020/05/05/498931.full
AB  - When data are not normally distributed (e.g. skewed, zero-inflated, binomial, or count data) researchers are often uncertain whether it may be legitimate to use tests that assume Gaussian errors (e.g. regression, t-test, ANOVA, Gaussian mixed models), or whether one has to either model a more specific error structure or use randomization techniques.Here we use Monte Carlo simulations to explore the pros and cons of fitting Gaussian models to non-normal data in terms of risk of type I error, power and utility for parameter estimation.We find that Gaussian models are remarkably robust to non-normality over a wide range of conditions, meaning that P-values remain fairly reliable except for data with influential outliers judged at strict alpha levels. Gaussian models also perform well in terms of power and they can be useful for parameter estimation but usually not for extrapolation. Transformation of data before analysis is often advisable and visual inspection for outliers and heteroscedasticity is important for assessment. In strong contrast, some non-Gaussian models and randomization techniques bear a range of risks that are often insufficiently known. High rates of false-positive conclusions can arise for instance when overdispersion in count data is not controlled appropriately or when randomization procedures ignore existing non-independencies in the data.Overall, we argue that violating the normality assumption bears risks that are limited and manageable, while several more sophisticated approaches are relatively error prone and difficult to check during peer review. Hence, as long as scientists and reviewers are not fully aware of the risks, science might benefit from preferentially trusting Gaussian mixed models in which random effects account for non-independencies in the data in a transparent way.Tweetable abstract Gaussian models are remarkably robust to even dramatic violations of the normality assumption.Competing Interest StatementThe authors have declared no competing interest.