Abstract
The reproducibility and translation of neuroscience research is assumed to be undermined by introducing environmental complexity and heterogeneity. Rearing laboratory animals with minimal (if any) environmental stimulation is thought to control for biological variability but may not adequately test the robustness of our animal models. Standard laboratory housing is associated with reduced demonstrations of species typical behaviors and changes in neurophysiology that may impact the translation of research results. Moreover, modest increases in environmental enrichment (EE) mitigate against insults used to induce animal models of disease, directly calling into question the translatability of our work and may in part underlie the disconnect between preclinical and clinical research findings. Enhancing environmental stimulation for our model organisms promotes ethological natural behaviors but may simultaneously increase phenotypic trait variability. To test this assumption, we conducted a systematic review and evaluated coefficients of variation between EE and control housed animals. Overall, animals housed in enrichment were not more variable than controls. Therefore, environmental heterogeneity introduced into the laboratory does not compromise data integrity.
Highlights
-Environmental complexity is thought to increase phenotypic variability, undermining research translation
-We conducted a systematic review to compare between environmentally enriched and control housed animals
-Coefficients of variation were evaluated across several phenotypic traits
-Animals housed in environmental enrichment were not more variable than controls
-We also provide a methodological reporting table for environmental enrichment use in neuroscience research
1. Introduction
Contributions to phenotypic variation are thought to derive not only from genotype but from multiple environmental factors that range from feeding and microbiology, all the way to variables as seemingly simple as housing condition. In experimental research, scientists attempt to control factors presumed to have an impact on biological variation and consequently the reproducibility of their data. One way to control for phenotypic variability in the laboratory is to standardize animal caging systems and limit environmental complexity. Environmental enrichment (EE) is one form of complexity that includes physical, sensory, cognitive, and/or social stimulation which provides an enhanced living experience to laboratory animals, relative to standard housing conditions. The use of EE has become prominent in neuroscience, due to substantial evidence that EE influences structural and functional changes in the brain, in addition to engendering enduring effects on behavior (Kemperman, 2019; Nithianantharajah, J., & Hannan, 2006). The provisioning of supplementary resources to animals not only maintains their welfare but promotes more naturalistic species typical behavioral repertoires (Bloomsmith et al., 2018). Moreover, this enhanced rearing condition has been used to study the mitigative potential of the environment in a variety of animal disease models (Nithianantharajah, J., & Hannan, 2006).
Regardless of the purpose of its use, there are questions about potential within- and between-experiment variability that may accompany the addition of environmental complexity to animal laboratory cages (Kempermann, 2019; Bayne & Würbel, 2014, Grimm, 2018; Toth, 2015; Toth et al., 2011). It is thought that the diverse phenotypes promoted by EE may lead to data variation within a study. Moreover, the variety in enrichment protocols used may create data variability between studies and laboratories, compromising data reproducibility. Together, these concerns foster arguments to maintain barren cages as the ‘gold’ standard housing condition in the study of brain and behavior (Bayne & Würbel, 2014; Voelkl et al., 2020). Importantly, similar justifications (of increased variation) have been used to support the exclusion of studying females in research, due to hormonal fluctuations across the reproductive cycle. However scientific evidence has since shown this perspective to be incorrect (Becker et al., 2016; Beery, 2018).
Given the shifting attention of the scientific community to the topic of rigor and reproducibility (Toth, 2015; Voelkl et al., 2020), this is the perfect time to reconsider our assumptions about variation due to environmental complexity. Standardization of the environment intuitively falls in line with the scientific method. Parsing out contributors of extraneous variation (Phenotype (P) = Gene x Environmental interactions; G × E) is thought to increase statistical power and reproducibility between experiments. On the other hand, such standardization leads to homogeneity in a population and may undermine the robustness of the potential treatment being studied (Kentner et al., 2018; see Voelkl et al., 2020 for an excellent recent review), a crucial concern given the disconnect between preclinical and clinical research outcomes (Berk, 2012; Hyman, 2012; Munos, 2013).
Still, to control for potential variability, pursuits to standardize the environment continues. These efforts have been complicated by varying definitions of what is enriching to animals of each species, strain and sex (Simpson & Kelly, 2011; Toth, 2015; Toth et al., 2011), even for standard laboratory housing where only minimal EE is recommended or required. Moreover, a lack of reporting on what types of enrichment protocols are used (e.g. shelters, nesting materials, cage mates, music, food/treats; Toth, 2015) make this task even more difficult. Overall, the differential implementation of EE in experimental design has provoked discussion over the inconsistent definitions and reporting methodology of enrichment use in the neuroscience literature, and whether standardization and minimization of laboratory caging is necessary to prevent further extraneous biological variation (Bayne & Würbel, 2014; Toth, 2015).
Outside of theoretical debates, the data on whether EE may contribute to the replication crisis, by increasing phenotypic variability and undermining research findings, is rather mixed (Toth, 2015; Toth et al., 2011; Walsh & Cummins, 1979; Wolfer et al., 2004; Würbel, 2007) and so concerns about its use persist (Grimm, 2018). Recently, there has been a call to action suggesting that the question of biological variation and its impact on rigor and reproducibility be extended to the diversification of environmental conditions or “controlled heterogenization” (Voelkl et al., 2020). For example, diversification may be implemented by using different sexes, animal strains, ages, and even housing conditions (e.g. EE) within a study. One way to address the question of variability due to the implementation of EE is to utilize the methods of others who have conducted large scale evaluations comparing between male and female animals (Becker et al., 2016) and inbred versus outbred strains of mice (Tuttle et al., 2018). Indeed, it has been noted that the EE literature has typically focused on mean differences between groups, rather than on evaluations of whether EE increases variability specifically (Kempermann, 2019). Of the small subset that have studied variation directly (e.g. Wolfer et al., 2004; Würbel, 2007; André et al., 2018) they have focused on mice and on a limited number of strains within the confines of their own experiments. To our knowledge, there has been no systematic literature-wide evaluation of multiple traits comparing EE to control groups across species.
2. Method
To evaluate whether EE housed rats or mice display increased phenotypic variability in neuroscience research, we conducted a systematic review and compared the coefficient of variability (CV), a measure of trait-specific variability, extracted from data where EE animals were directly compared to a control housed condition on the same trait. First, to determine the general scientific interest in EE protocols, the proportion of articles published each year, using the search term “environmental enrichment” was identified in PubMed (Sperr, 2016).
2.1. Search Strategy
Both PubMed and EMBASE were searched from the period of January 1st 2013 to September 5 2018 using the search terms (1) environmental enrichment AND (2) electrophysiology OR (3) brain OR (4) behavior OR (5) “nervous system physiological phenomena”, which yielded 3,650 articles (Figure 1). A detailed list of search terms is included in Supplementary Methods A.
2.2. Study Selection
After duplicates were removed, evaluators independently identified studies eligible for inclusion in a 2-step process. First, we conducted an abstract and title search. If insufficient details were provided in the titles and abstracts, then the study was selected for full text review. Eligibility was based on (1) article relevance to the subject matter of interest (EE), (2) studies using any animal species including humans, (3) observational and experimental studies, and (4) English-written articles only. Exclusion criteria consisted of reviews, meta-analyses, case studies, conference abstracts, protocols, editorials, comments, and non-English articles. The selected articles are listed in Supplementary Table 1.
2.3. Data Extraction
Of the 963 articles identified as using EE in any species, a subset of 681 articles were identified as using mice or rats and were further evaluated on their use of several methodological variables including sex, types of enrichment devices employed, in addition to EE social structure and composition of the control conditions used (e.g. running wheel, isolated, social/group housing). Phenotypic variability was also evaluated on the rat and mouse studies identified as using traditional EE caging systems (Figure 2A). For these analyses, 281 studies were evaluated based on meeting the inclusion criteria of providing means and standard deviations (or standard errors) that could be extracted from the article, and sample sizes for at least one EE and one control group (Figure 1). We also identified whether EE and control groups were naïve or ‘treated/manipulated’ (e.g. drug treated, knockout models, surgery etc.). Studies with parental exposure to EE were excluded to control for potential confounds of parental care (Connors et al., 2015), as were studies where it was unclear if control animals were singly or socially housed. To avoid oversampling (Tuttle et al., 2018), we limited data collection to the first three clearly reported measures (e.g. where there was no overlap of error bars, for example from line graphs). Each measure was categorized similarly to others (Becker et al., 2016; Tuttle et al., 2018) by using anatomy, behavior/CNS, behavior/other, immune system, molecules, organ function, and electrophysiology as traits. In total there were 1130 direct comparisons of CVs between EE and control animals included here (618 naïve pair comparisons and 512 manipulated/treated pair comparisons). Data were extracted from graphs provided on digital PDF articles (using http://rhig.physics.yale.edu/~ullrich/software/xyscan/), or directly from tables. Graphical data extractions were performed by two trained researchers. Inter-rater reliability was assessed, and Pearson r correlation was determined to range from 0.912-0.997.
2.4. Statistical Analyses
CVs were calculated as standard deviation divided by the mean and compared using paired t-tests (for individual trait evaluations), or ANOVA (for multiple trait evaluations). Pairwise comparisons were done using the Tukey’s multiple comparisons test (Becker et al., 2016; Howell, 2001). To determine whether the distribution of variation differed by environmental complexity, we calculated EE to control ratios of CV = [(CVEE)/(CVEE + CVcontrol)]. CV ratios for each trait were tested as a function of housing complexity against the theoretical mean 0.5 by t test (Becker et al., 2016; Beery, 2018). Data were considered significant if p<0.05.
3. Results
Using the term “environmental enrichment” we identified the proportion of articles indexed in PubMed each year from 1998 to 2019 (Sperr, 2016). This search, including both review and empirical research articles, highlights an increasing interest on this topic relative to the number of total articles published (Figure 2B).
The results of our analyses demonstrate patterns of experimental biases, specifically a heavy reliance on the use of rats and mice over other laboratory species (Figure 2C), and the continued exclusion of females in EE research (Figure 2D; Simpson & Kelly, 2011). Our findings also show a range in the definition of EE used across laboratories in that the frequency of enrichment types, timing, and the social structures implemented varied widely (Figure 3A-F).
One issue that arose was a significant lack of reporting on several variables. This prompted us to develop a reporting table for describing key aspects of enrichment use in research (Supplementary Table 2), following suite with other initiatives to improve on animal model reporting (Kentner et al., 2019).
Using paired t-tests, we found no differences between EE and control mice or rats on CVs across traits (p>0.05), regardless of control housing type (e.g. running wheel, isolated, social/group housing) or whether animals were naïve or manipulated/treated (e.g. drug treated, knockout models, surgery). Therefore, we collapsed and analyzed both species together. When species were combined, the treated/manipulated social/group housed controls (0.65 ±0.073) were more variable than their manipulated/treated EE counterparts (0.59 ± 0.050; t(46) = 2.211, p = 0.032) on the “behavior other – non CNS” trait only. Isolated control animals (0.24 ± 0.079) had higher CVs than manipulated/treated EE animals (0.019 ± 0.072; (t(4) = 4.720, p = 0.009) on the anatomy trait. However, for the anatomy trait the number of available comparisons between these two groups was not sufficiently powered (n = 5 comparisons based on 3 articles). In general, we did not find EE to increase trait variability compared to any control housing type in either naïve or manipulated/treated animals (p>0.05).
To increase the power in our analyses, we collapsed the control group types together and analyzed across species and traits, both separately and together. Again, we found that EE does not make animals more variable than controls (p> 0.05; Figure 4A-D, see also Supplementary Table 3 for statistical breakdown). When species were combined, we found that controls were more variable (had higher CVs) than EE housed animals under treated/manipulated conditions. However, this was only found on the “overall behavior” (main effect of housing: t(290) = 2.120, p = 0.035; Control CV: 0.67 ± 0.06/EE CV: 0.56 ± 0.04) and “behavior other – non CNS” traits (main effect of housing: t(46) = 2.211, p = 0.032; Control CV: 0.73 ± 0.07/EE CV: 0.60 ± 0.05, based on 21 articles; Figure 4BD). There were no main effects of housing, nor significant housing by trait interactions on the two-way ANOVAs (p> 0.05, see also Supplementary Table 3). However, there were significant main effects of trait, indicating that “behavior” was more variable than “anatomical” traits for both rats (main effect of trait: F(5, 542) = 4.015, p = 0.001, n2 = 0.036; Tukey HSD: p = 0.004) and mice (main effect of trait: F(6, 460) = 4.953, p = 0.0001, n2 = 0.057; Tukey HSD: p = 0.001; see Supplementary 3).
When comparing CV ratios, the data did not support the premise that environmental complexity increases variability in neuroscience research (p >0.05; Figure 4E).
Discussion
Our findings should resonate well with neuroscientists who would like to increase complexity in laboratory caging systems, promoting more naturalistic species typical behaviors and brain functioning, but who have been concerned about compromising data integrity and their control over environmental conditions. This should be especially salient given that lack of enrichment in laboratory cages leads to suppression of behavioral repertoires, increased stereotypies, and a reduction of general activity level, even during an animals’ active phase (Hurst et al., 1997).
Over the years there have been complaints about the feasibility and costs associated with EE, in addition to arguments of increased phenotypic variability, as a result of modeling more naturalistic settings in the laboratory environment (Grimm, 2018). However, EE may not need to be extravagant or require larger caging systems or space but may be as simple as regularly changing enrichment devices (Kentner et al., 2018). Notably, investigators often group house their animals to reduce stress (Hurst et al., 1997); and consequentially save on laboratory caging costs). One potential contributor to the shaping of phenotype could be the shared experiences in EE, resulting in within-group differences. Individual animals influence their environment, and each other, affecting phenotypes and preventing full control of the environment. Therefore, EE could be considered not just as P = G x E, but as G x (Eshared + Enonshared; see Kempermann, 2019 for an excellent review). This equation is also relevant to pair and grouped cage housing, which do not increase phenotypic trait variability (Becker et al., 2016), similarly to what we show here with more naturalistic settings.
From a purely scientific perspective, EE can mitigate the effects of several experimental treatments and animal models of disease (Nithianantharajah, J., & Hannan, 2006). This calls into question the external validity of these apparent context specific effects. Indeed, incorporating more environmental heterogeneity into neuroscience research, and testing our findings against such complexity, should increase the robustness of our experimental designs and the fidelity of biomedical treatments (Kentner et al., 2018; Voelkl et al., 2020), without compromising the underlying stability of data. Our study supports this idea given that traditional EE caging systems are dynamic environments where devices are being replaced or are changing location as animals interact and move them. Moreover, social experiences are varied for each animal. Specifically, experiences both between and within EE cages are unique, yet complex housing does not make animals any more variable compared to standard housed rats or mice.
Going forward, it will be necessary to identify appropriate enrichment types for the species, sex and age of the subject of interest, in addition to the animal model/paradigm being used, and to accurately report their use (Kentner et al., 2018; Simpson & Kelly, 2011;Toth, 2015). Importantly, there are proposed methodologies for how to implement and account for such environmental variation (Voelkl et al., 2020). Overall, human life is complicated and by embracing such nuanced complexity into our laboratories we may paradoxically improve upon the rigor and reproducibility of our research.
Competing Interests
The authors have no competing interests to declare.
Author Contributions
A.C.K. designed and supervised the study and wrote the manuscript. The study was carried out and analyzed by A.V.P, R.C.R., J.D, and A.C.K.
Acknowledgments
This project was funded by NIMH under Award Number R15MH114035 (to ACK) and a MCPHS Summer Undergraduate Fellowship (SURF) awarded to R.C.R. The authors are grateful to Dominic Rainone, Jie Yi Tan, Victoria Perez, Alexandra Best, Madeline Puracchio, and Yvonne Zheng for their help with data collection. The authors would also like to thank the MCPHS University School of Pharmacy and School of Arts & Sciences for their continual support. The content is solely the responsibility of the authors and does not necessarily represent the official views of any of the financial supporters.