Abstract
Improving the reproducibility of biomedical research is a major challenge. Transparent and accurate reporting are vital to this process; it allows readers to assess the reliability of the findings, and repeat or build upon the work of other researchers. The NC3Rs developed the ARRIVE guidelines in 2010 to help authors and journals identify the minimum information necessary to report in publications describing in vivo experiments.
Despite widespread endorsement by the scientific community, the impact of the ARRIVE guidelines on the transparency of reporting in animal research publications has been limited. We have revised the ARRIVE guidelines to update them and facilitate their use in practice. The revised guidelines are published alongside this paper. This Explanation and Elaboration document was developed as part of the revision. It provides further information about each of the 21 items in ARRIVE 2019, including the rationale and supporting evidence for their inclusion in the guidelines, elaboration of details to report, and examples of good reporting from the published literature.
Introduction
Transparent and accurate reporting is essential to improve the reproducibility of scientific research; it enables others to scrutinise the methodological rigour of the studies, assess how reliable the findings are, and repeat or build upon the work.
However, evidence shows that the majority of publications fail to include key information and there is significant scope to improve the reporting of studies involving animal research [1–4]. To that end, the NC3Rs published the ARRIVE guidelines in 2010. The guidelines are a checklist of information to include in a manuscript to ensure that publications contain enough information to add to the knowledge base [5]. The guidelines have received widespread endorsement from the scientific community and are currently recommended by more than a thousand journals, with further endorsement from research funders, universities and learned societies worldwide.
Studies measuring the impact of ARRIVE on the quality of reporting have produced mixed results [6–11] and there is evidence that in vivo scientists are not sufficiently aware of the importance of reporting the information covered in the guidelines, and fail to appreciate the relevance to their work or their research field [12].
As a new international working group – the authors of this publication, we have revised the guidelines to update them and facilitate their uptake; the ARRIVE guidelines 2019 are published alongside this paper [13]. We have updated the recommendations in line with current best practice, reorganised the information and classified the items into two sets. The ARRIVE Essential 10 constitute the minimum reporting requirement and the Recommended Set provides further context to the study described. The two sets help authors, journal staff, editors and reviewers use the guidelines in practice, and allow a pragmatic implementation with an initial focus on the most critical issues. Once the Essential 10 are consistently reported in manuscripts, items from the Recommended Set can be added to journal requirements over time until all 21 items are routinely reported in all manuscripts. Full methodology for the revision and the allocation of items into sets is described in the accompanying publication [13].
A key aspect of the revision was to develop this Explanation and Elaboration document to provide background and rationale for each of the 21 items of ARRIVE 2019. Here we present additional guidance for each item and subitem, explain the importance of reporting this information in manuscripts that describe animal research, elaborate on what to report, and provide supporting evidence. Each subitem is also illustrated with examples of good reporting from the published literature.
Glossary
Bias: Introduction of a systematic error in the estimated effect of an intervention, caused by inadequacies in the design, conduct, or analysis of an experiment.
Effect size: Quantitative measure that estimates the magnitude of differences between groups, or relationships between variables.
Experimental unit: Biological entity subjected to an intervention independently of all other units, such that it is possible to assign any two experimental units to different treatment groups.
External validity: Extent to which the results of an animal experiment provide a correct basis for generalisations to other populations of animals (including humans) and/or other environmental conditions.
False positive: Statistically significant result obtained by chance when the effect being investigated does not exist.
False negative: Non-statistically significant result obtained when the effect being investigated genuinely exists.
Independent variable of interest: Factor that a researcher manipulates within a controlled environment in order to test its impact on the outcome measured. Also known as: predictor variable, factor of interest.
Internal validity: Refers to the rigour of the study design and statistical analysis to isolate cause and effect, and attribute the effect observed to manipulation of the independent variable of interest. In an experiment with high internal validity, sources of bias and chance observations are minimised. In an experiment with low internal validity, the effect may be caused by bias, chance and other nuisance variables rather than the independent variable(s) of interest.
Null and alternative hypotheses: The null hypothesis (H0) refers to the postulate that the response being measured is unaffected by the experimental manipulation being tested. The alternative hypothesis (H1) refers to the postulate that manipulating the independent variable of interest has an effect on the response measured.
Nuisance variable: Sources of variability or conditions that could potentially bias results. Also known as: confounding factor, confounding variable
Outcome measure: Any variable recorded during a study to assess the effects of a treatment or experimental intervention. Also known as: dependent variable, response variable
Power: Probability that a test of significance will detect an effect (i.e. a deviation from the null hypothesis), if an effect exists (i.e. true positive result).
Sample size: Number of experimental units per group, also referred to as N number.
Definitions adapted from [14, 15] and placed in the context of animal research.
1. ARRIVE Essential 10
The ARRIVE Essential 10 (Box 2) constitute the minimum reporting requirement, to ensure that reviewers and readers can assess the reliability of the findings presented. There is no ranking within the set, items are presented in a logical order.
ARRIVE Essential 10
Study design
Sample size
Inclusion and exclusion criteria
Randomisation
Blinding
Outcome measures
Statistical methods
Experimental animals
Experimental procedures
Results
Item 1. Study design
For each experiment, provide brief details of study design including:
1a. The groups being compared, including control groups. If no control group has been used, the rationale should be stated.
Explanation
The choice of control or comparator group is dependent on the experimental objective. Negative controls are used to determine if a difference between groups is caused by the intervention (e.g. wild-type animals vs genetically modified animals, placebo vs active treatment, sham surgery vs surgical intervention). Positive controls can be used to support the interpretation of negative results or determine if an expected effect is detectable.
It may not be necessary to include a separate control with no active treatment if, for example, the experiment aims to compare a treatment administered by different methods (e.g. intraperitoneal administration vs. oral gavage), or animals that are used as their own control in a longitudinal study. A pilot study, such as one designed to test the feasibility of a procedure might also not require a control group.
For complex study designs, a visual representation is more easily interpreted than a text description, so a timeline diagram or flow chart is recommended. Diagrams facilitate the identification of which treatments and procedures were applied to specific animals or groups of animals, and at what point in the study these were performed. They also help to communicate complex design features such as clustering or nesting (hierarchical designs), blocking (to reduce unwanted variation, see item 4 – Randomisation), or repeated measurements over time on the same experimental unit (repeated measures designs) [16, 17]. The Experimental Design Assistant (EDA) is a platform to support researchers in the design of in vivo experiments, it can be used to generate diagrams to represent any type of experimental design [18].
Report the groups clearly so that test groups, comparators and controls (negative or positive) can be identified easily. State clearly if the same control group was used for multiple experiments.
Examples
“The DAV1 study is a one-way, two-period crossover trial with 16 piglets receiving amoxicillin and placebo at period 1 and only amoxicillin at period 2. Amoxicillin was administered orally with a single dose of 30 mg.kg−1. Plasma amoxicillin concentrations were collected at same sampling times at each period: 0.5, 1, 1.5, 2, 4, 6, 8, 10 and 12 h.” [19]
“Figure A: Example of a study plan created using the Experimental Design Assistant showing a simple comparative study for the effect of two drugs on the metastatic spread of two different cancer cell lines. Block randomisation has been used to create 3 groups containing an equal number of zebrafish embryos injected with either cell line, and each group will be treated with a different drug treatment (including vehicle control). Each measurement outcome will be analysed by 2-way ANOVA to determine the effect of drug treatment on growth, survival and invasion of each cancer cell line.” [20]
1b. The experimental unit (e.g. a single animal, litter, or cage of animals).
Explanation
The experimental unit is the biological entity subjected to an intervention independently of all other units, such that it is possible to assign any two experimental units to different treatment groups. The sample size is the number of experimental units per group.
Clearly indicate the experimental unit for each experiment so that the sample sizes and statistical analyses can be properly evaluated. There is a risk that if the experimental unit is not correctly identified, the sample size used in the data analysis will be incorrect. Inflation of the sample size by conflating experimental units with subsamples or repeated measurements is known as ‘pseudoreplication’. This may invalidate the analysis and resulting conclusions [21, 22] (see also item 7 – Statistical methods).
Commonly, the experimental unit is the individual animal, each independently allocated to a treatment group (e.g. a drug administered by injection). However, the experimental unit may be the cage or the litter (e.g. a diet administered to a whole cage, or a treatment administered to a dam and investigated in her pups), or it could be part of the animal (e.g. different drug treatments applied topically to distinct body regions of the same animal). Animals may also serve as their own controls receiving different treatments separated by washout periods; here the experimental unit is an animal for a period of time. There may also be multiple experimental units in a single experiment, such as when a treatment such as diet is given to a pregnant dam and then the weaned pups are allocated to different diets [23]. See [24–26] for further guidance on identifying experimental units.
Examples
“The present study used the tissues collected at E15.5 from dams fed the 1X choline and 4X choline diets (n = 3 dams per group, per fetal sex; total n = 12 dams). To ensure statistical independence, only one placenta (either male or female) from each dam was used for each experiment. Each placenta, therefore, was considered to be an experimental unit.” [27]
“We have used data collected from high-throughput phenotyping, which is based on a pipeline concept where a mouse is characterized by a series of standardized and validated tests underpinned by standard operating procedures (SOPs)…. The individual mouse was considered the experimental unit within the studies.” [28]
“Fish were divided in two groups according to weight (0.7-1.2 g and 1.3-1.7 g) and randomly stocked (at a density of 15 fish per experimental unit) in 24 plastic tanks holding 60 L of water.” [29]
“In the study, n refers to number of animals, with five acquisitions from each [corticostriatal] slice, with a maximum of three slices obtained from each experimental animal used for each protocol (six animals each group).” [30]
Item 2. Sample size
2a. Specify the exact number of experimental units allocated to each group, and the total number in each experiment. Also indicate the total number of animals used.
Explanation
Sample size relates to the number of experimental units in each group at the start of the study, and is usually represented by N (See item 1 - Study design for further guidance on identifying and reporting experimental units). This information is crucial to assess the validity of the statistical model and the robustness of the experimental results.
Report the exact value of N per group and the total number in each experiment. If the experimental unit is not the animal, also report the total number of animals to help readers understand the study design. For example, in a study investigating diet using cages of animals housed in pairs, the number of animals is double the number of experimental units. Reporting the total number of animals is also useful to identify if any were re-used between experiments.
Example
“Dams (for n see Table …) were assigned to treatments in a manner that provided similar means and variances in body weight before dosing was initiated…. For statistical purposes, the numbers/group are the number of litters, not the number of pups.” [31]
2b. Explain how the sample size was decided. Provide details of any a priori sample size calculation, if done.
Explanation
For any type of experiment, it is crucial to explain how the sample size was determined. For hypothesis-testing experiments, where inferential statistics are used to estimate the size of the effect and to determine the weight of evidence against the null hypothesis, the sample size needs to be justified to ensure experiments are of an optimal size to test the research question [32, 33] (see item 13 – Objectives). Power is the probability that a test of significance will detect an effect (i.e. a deviation from the null hypothesis), when the effect being investigated genuinely exists (i.e. true positive result). Sample sizes that are too small (i.e. underpowered studies) produce inconclusive results, whereas sample sizes that are too large (i.e. overpowered studies) raise ethical issues over unnecessary use of animals and may produce trivial findings that are statistically significant but not biologically relevant [34]. Low power has three effects: first, within the experiment, real effects are more likely to be missed; second, where an effect is detected, this will often be an over-estimation of the true effect size [25]; and finally, when low power is combined with publication bias, there is an increase in the false positive rate in the published literature [35]. Consequently, low powered studies contribute to the poor internal validity of research and risk wasting animals used in inconclusive research [36].
Study design can influence the statistical power of an experiment. Split-plot designs [37], factorial designs [38], or group-sequential designs [39] can increase the power of a study for a given number of animals. Statistical programs to help perform a priori sample size calculations exist for a variety of experimental designs and statistical analyses, for example G*power [40]. Choosing the appropriate calculator or algorithm to use depends on the type of outcome measures and independent variables, and the number of groups. Consultation with a statistician is recommended, especially when the experimental design is complex or unusual.
Where the experiment tests the effect of an intervention on the mean of a continuous outcome measure, the sample size can be calculated a priori, based on a mathematical relationship between the desired effect size, variability estimated from prior data, chosen significance level, power and sample size (See Box 3, and [24, 41] for practical advice). For an a priori sample size determination, report the analysis method (e.g. two-tailed student’s t-test with a 0.05 significance threshold), the effect size of interest and a justification explaining why this effect size is relevant, the estimate of variability used (e.g. standard deviation) and how it was estimated, and the power selected.
Information used in a power calculation
Sample size calculation is based on a mathematical relationship between the following parameters: effect size, variability, significance level, power and sample size. Questions to consider are:
The primary objective of the experiment – what is the main outcome measure?
The primary outcome measure should be identified in the planning stage of the experiment; it is the outcome of greatest importance, which will answer the main experimental question.
What is a biologically or clinically relevant effect size?
The effect size is the minimum change in the primary outcome measure between the groups under study, which would be of interest biologically and would be worth taking forward into further work.
What is the estimate of variability?
Estimates of variability can be obtained:
From data collected from a preliminary experiment conducted under identical conditions to the planned experiment, e.g. a previous experiment in the same lab, testing the same treatment, under similar conditions, on animals with the same characteristics.
From the control group in a previous experiment testing a different treatment.
From a similar experiment reported in the literature.
What risk of a false positive is acceptable? (significance threshold)
The significance level or threshold (α) is the probability of obtaining a significant result by chance (a false positive) when the null hypothesis is true (i.e. there is no real, biologically relevant difference between the groups). If it is set at 0.05 then the risk of obtaining a false positive is 1 in 20 for a single statistical test. However, the threshold or the p values will need to be adjusted in scenarios of multiple testing (e.g. by using a Bonferroni correction).
What risk of a false negative is acceptable? (power)
The power (1-β) is the probability that the experiment will correctly lead to the rejection of the null hypothesis if the effect being investigated genuinely exists (i.e. detect that there is a biologically meaningful difference when there is one). A target power between 80-95% is normally deemed acceptable.
Will you use a one or two-sided test? (directionality)
The directionality of a test depends on the distribution of the test statistics for a given analysis. For tests based on t or z distributions (such as t-tests), whether the data will be analysed using a one or two-sided test relates to whether the alternative hypothesis (H1) is directional or not. An experiment with a directional (one-sided) H1 can be powered and analysed with a one-sided test. This assumes that direction of the effect is known (this is very rare in biology) and the goal is to maximise the chances of detecting this effect. However, the investigator cannot then test for the possibility of missing an effect in the untested direction. Choosing a one-tailed test for the sole purpose of attaining statistical significance is not appropriate.
Two-sided tests with a non-directional H1 are much more common and allow researchers to detect the effect of a treatment regardless of its direction.
Note that analyses such as ANOVA and chi-square are based on asymmetrical distributions (F-distribution and chi-square distribution) with only one tail. Therefore, these tests do not have a directionality option.
There are several types of studies where a priori sample size calculations are not appropriate. For example, the number of animals needed for antibody or tissue production is determined by the amount required and the production ability of an individual animal. For studies where the outcome is a successful generation of a sample or a condition (e.g. the production of transgenic animals), the number of animals is determined by the probability of success of the experimental procedure.
In early feasibility or pilot studies, the number of animals required depends on the purpose of the study. Where the objective of the preliminary study is to improve procedures and equipment, the number of animals needed is generally small. In such cases power calculations are not appropriate and sample sizes can be estimated based on operational capacity and constraints [42]. Pilot studies alone are unlikely to provide adequate data on variability for a power calculation for future experiments. Systematic reviews and previous studies are more appropriate sources of information on variability [43].
Regardless of whether a power calculation was used or not, when explaining how the sample size was determined take into consideration any anticipated loss of animals or data, for example due to exclusion criteria established upfront or expected attrition (see item 3 – inclusion and exclusion criteria).
Examples
“The sample size calculation was based on postoperative pain numerical rating scale (NRS) scores after administration of buprenorphine (NRS AUC mean = 2.70; noninferiority limit = 0.54; standard deviation = 0.66) as the reference treatment and also Glasgow Composite Pain Scale (GCPS) scores using online software (Experimental design assistant; https://eda.nc3rs.org.uk/eda/login/auth). The power of the experiment was set to 80%. A total of 20 dogs per group were considered necessary.” [44]
“We selected a small sample size because the bioglass prototype was evaluated in vivo for the first time in the present study, and therefore, the initial intention was to gather basic evidence regarding the use of this biomaterial in more complex experimental designs.” [45]
Item 3. Inclusion and exclusion criteria
3a. Describe any criteria established a priori for including or excluding animals (or experimental units) during the experiment, and data points during the analysis.
Explanation
Inclusion and exclusion criteria define the eligibility or disqualification of animals and data once the study has commenced. To ensure scientific rigour, the criteria should be defined before the experiment starts and data are collected [46]. Inclusion criteria should not be confused with animal characteristics (see item 8 – Experimental animals) but can be related to these (e.g. body weights must be within a certain range for a particular procedure) or related to other study parameters (e.g. task performance has to exceed a given threshold). Exclusion criteria may result from technical or welfare issues such as complications anticipated during surgery, or circumstances where test procedures might be compromised (e.g. development of motor impairments that could affect behavioural measurements). Criteria for excluding samples or data include failure to meet quality control standards, such as insufficient sample volumes, unacceptable levels of contaminants, poor histological quality, etc. Similarly, how the researcher will define and handle data outliers during the analysis should be also decided before the experiment starts (see subitem 3b for guidance on responsible data cleaning).
Exclusion criteria may also reflect the ethical principles of a study in line with its humane endpoints (see item 16 – Animal care and monitoring). For example, in cancer studies an animal might be dropped from the study and euthanised before the predetermined time point if the size of a subcutaneous tumour exceeds a specific volume [47]. If losses are anticipated, these should be considered when determining the number of animals to include in the study (see item 2 – Sample size).
Best practice is to include all a priori inclusion and exclusion/outlier criteria in a pre-registered protocol (see item 19 – Protocol registration). At the very least these criteria should be documented in a lab notebook and reported in manuscripts, explicitly stating that the criteria were defined before any data was collected.
Example
“The animals were included in the study if they underwent successful MCA occlusion (MCAo), defined by a 60% or greater drop in cerebral blood flow seen with laser Doppler flowmetry. The animals were excluded if insertion of the thread resulted in perforation of the vessel wall (determined by the presence of sub-arachnoid blood at the time of sacrifice), if the silicon tip of the thread became dislodged during withdrawal, or if the animal died prematurely, preventing the collection of behavioral and histological data.” [48]
3b. For each experimental group, report any animals, experimental units or data points not included in the analysis and explain why.
Explanation
Animals, experimental units, or data points that are unaccounted for can lead to instances where conclusions cannot be supported by the raw data [49]. Reporting exclusions and attritions provides valuable information to other investigators evaluating the results, or who intend to repeat the experiment or test the intervention in other species. It may also provide important safety information for human trials (e.g. exclusions related to adverse effects).
There are many legitimate reasons for experimental attrition, some of which are anticipated and controlled for in advance (see subitem 3a on defining exclusion and inclusion criteria) but some data loss might not be anticipated. For example, data points may be excluded from analyses due to an animal receiving the wrong treatment, unexpected drug toxicity, infections or diseases unrelated to the experiment, sampling errors (e.g. a malfunctioning assay that produced a spurious result, inadequate calibration of equipment), or other human error (e.g. forgetting to switch on equipment for a recording).
In some instances, it may be scientifically justifiable to remove outlier data points from an analysis, such as readings that are outside a plausible range. Providing the reasoning for removing data points enables the distinction to be made between responsible data cleaning and data manipulation. When reasons are not disclosed the reliability of the conclusions is in question, as inappropriate data cleaning has the potential to bias study outcomes [50].
There is a movement towards greater data sharing (see item 20 – Data access), along with an increase in strategies such as code sharing to enable analysis replication. These practices, however transparent, still need to be accompanied by a disclosure on the reasoning for data cleaning, and whether it was pre-defined.
Report all animal exclusions and loss of data points, along with the rationale for their exclusion. Accompanying these criteria should be an explicit description of whether researchers were blinded to the group allocations when data or animals were excluded (see item 5 – Blinding and [51]), and whether these criteria were decided prior to the experiment [8, 32, 52]. Explicitly state where built-in models in statistics packages have been used to remove outliers (e.g. GraphPad Prism’s outlier test).
Examples
“Pen was the experimental unit for all data. One entire pen (ZnAA90) was removed as an outlier from both Pre-RAC and RAC periods for poor performance caused by illness unrelated to treatment…. Outliers were determined using Cook’s D statistic and removed if Cook’s D > 0.5. One steer was determined to be an outlier for day 48 liver biopsy TM and data were removed.” [53]
“Seventy-two SHRs were randomized into the study, of which 13 did not meet our inclusion and exclusion criteria because the drop in cerebral blood flow at occlusion did not reach 60% (seven animals), postoperative death (one animal: autopsy unable to identify the cause of death), haemorrhage during thread insertion (one animal), and disconnection of the silicon tip of the thread during withdrawal, making the permanence of reperfusion uncertain (four animals). A total of 59 animals were therefore included in the analysis of infarct volume in this study. In error, three animals were sacrificed before their final assessment of neurobehavioral score: one from the normothermia/water group and two from the hypothermia/pethidine group. These errors occurred blinded to treatment group allocation. A total of 56 animals were therefore included in the analysis of neurobehavioral score.” [48]
“Fig 1. Flow chart showing the experimental protocol with the number of animals used, died and included in the study…. After baseline CMR and echocardiography, MI was induced by left anterior descending (LAD) coronary artery ligation (n = 48), as previously described. As control of surgery procedure, sham operated mice underwent thoracotomy and pericardiotomy without coronary artery ligation (n = 12).”[54]
3c. For each analysis, report the exact value of N in each experimental group.
Explanation
The exact number of experimental units analysed in each group (i.e. the N number) is essential information for the reader to interpret the analysis, it should be reported unambiguously. All animals and data used in the experiment should be accounted for in the data presented. Sometimes, for good reasons, animals may need to be excluded from a study (e.g. illness or mortality), and data points excluded from analyses (e.g. biologically implausible values). Reporting losses will help the reader to understand the experimental design process, replicate methods, and provide adequate tracking of animal numbers in a study, especially when sample size numbers in the analyses do not match the original group numbers.
Indicate numbers clearly within the text or on figures, and provide absolute numbers (e.g. 10/20, not 50%). For studies where animals are measured at different time points, explicitly report the full description of which animals undergo measurement, and when [32].
Examples
“Group F contained 29 adult males and 58 adult females in 2010 (n = 87), and 32 adult males and 66 adult females in 2011 (n = 98). The increase in female numbers was due to maturation of juveniles to adults. Females belonged to three matrilines, and there were no major shifts in rank in the male hierarchy. Six mid to low ranking individuals died and were excluded from analyses, as were five mid-ranking males who emigrated from the group at the beginning of 2011.” [55]
“The proportion of test time that animals spent interacting with the handler (sniffed the gloved hand or tunnel, made paw contact, climbed on, or entered the handling tunnel) was measured from DVD recordings. This was then averaged across the two mice in each cage as they were tested together and their behaviour was not independent…. Mice handled with the home cage tunnel spent a much greater proportion of the test interacting with the handler (mean ± s.e.m., 39.8 ± 5.2 percent time of 60 s test, n = 8 cages) than those handled by tail (6.4 ± 2.0 percent time, n = 8 cages), while those handled by cupping showed intermediate levels of voluntary interaction (27.6 ± 7.1 percent time, n = 8 cages).” [56]
Item 4. Randomisation
Describe the methods used:
4a. To allocate experimental units to control and treatment groups. If randomisation was used, provide the method of randomisation.
Explanation
Using randomisation during the allocation to groups ensures that each experimental unit has an equal probability of receiving a particular treatment. It helps minimise selection bias and reduce systematic differences in the characteristics of animals allocated to different groups [57–59]. However, investigators frequently confuse “random” with “haphazard” or “arbitrary” allocation. Non-random allocation can introduce bias that influences the results, as a researcher may (consciously or subconsciously) make judgements in allocating an animal to a particular group, or because of unknown and uncontrolled differences in the experimental conditions or animals in different groups. Systematic reviews have shown that animal experiments that do not report randomisation or other bias-reducing measures such as blinding, are more likely to report exaggerated effects that meet conventional measures of statistical significance [60, 61]. It is especially important to use methods of randomisation in situations where it is not possible to blind all or parts of the experiment but even with randomisation, researcher bias can pervert the allocation. This can be avoided by using allocation concealment (see item 5 – Blinding). In studies where sample sizes are small, simple randomisation may result in unbalanced groups; here randomisation strategies to balance groups such as randomising in matched pairs [62–64] and blocking are encouraged [24].
Report the type of randomisation used (simple, stratified, randomised complete blocks, etc.; see Box 4), the method of randomisation (e.g. computer-generated randomisation sequence, with details of the algorithm or programme used), and what was randomised (e.g. treatment to experimental unit, order of treatment for each animal). The EDA has a dedicated feature for randomisation and allocation concealment [18].
Examples
“Fifty 12-week-old male Sprague-Dawley rats, weighing 320–360g, were obtained from Guangdong Medical Laboratory Animal Center (Guangzhou, China) and randomly divided into two groups (25 rats/group): the intact group and the castration group. Random numbers were generated using the standard = RAND() function in Microsoft Excel.” [65]
“Animals were randomized after surviving the initial I/R, using a computer based random order generator.” [66]
“At each institute, phenotyping data from both sexes is collected at regular intervals on age-matched wildtype mice of equivalent genetic backgrounds. Cohorts of at least seven homozygote mice of each sex per pipeline were generated…. The random allocation of mice to experimental group (wildtype versus knockout) was driven by Mendelian Inheritance.” [28]
4b. To minimise potential confounding factors such as the order of treatments and measurements, or animal/cage location.
Explanation
Ensuring there is no systematic difference between animals in different groups apart from the experimental exposure is an important principle throughout the conduct of the experiment. Identifying nuisance variables (sources of variability or conditions that could potentially bias results), and managing them in the design of the experiment increases the sensitivity of the experiment. For example, rodents in cages at the top of the rack may be exposed to higher light levels, which can affect stress [67]. Mitigation strategies for nuisance variables include randomising or counterbalancing the position of animal cages on the rack, and taking measurements or processing samples in a random order (preferably with the investigator blinded to the treatment received; see item 5 – Blinding). Such practices help avoid introducing unintentional systematic differences between comparison groups, also known as order effects. Strategies to avoid order effects include counterbalancing, randomising order of treatment, and blocking (see Box 4).
Considerations for randomisation
Simple randomisation
All animals/samples are simultaneously randomised to the treatment groups without considering any other variable. This strategy is rarely appropriate as it cannot ensure that comparison groups are balanced for factors or covariates that might influence the result of an experiment.
Randomisation within blocks
Blocking is a method of controlling natural variation among experimental units. This splits up the experiment into smaller sub-experiments (blocks), and treatments are randomised to experimental units within each block [24, 68, 69]. This takes into account nuisance variables that could potentially bias the results (e.g. cage location, day or week of procedure).
Stratified randomisation uses the same principle as randomisation within blocks, only the strata tend to be traits of the animal that are likely to be associated with the response (e.g. weight class or tumour size class). This can lead to differences in the practical implementation of stratified randomisation as compared to block randomisation (e.g. there may not be equal numbers of experimental units in each weight class).
Other randomisation techniques
Minimisation is an alternative strategy to allocate animals/samples to treatment group to balance confounding variables that might influence the result of an experiment. With minimisation the treatment allocated to the next animal/sample depends on the characteristics of those animals/samples already assigned. The aim is that each allocation should minimise the imbalance across multiple factors [70]. This approach works well for a continuous nuisance variable such as body weight or starting tumour volume.
Examples of nuisance variables that can be accounted for in the randomisation strategy
Time or day of the experiment
Litter, cage or fish tank
Investigator or surgeon – different level of experience in the people administering the treatments, performing the surgeries, or assessing the results may result in varying stress levels in the animals or duration of anaesthesia
Equipment (e.g. PCR machine, spectrophotometer) – calibration may vary
Measurement of a study parameter (e.g. initial tumour volume)
Animal characteristics – sex, age class, weight class
Location – exposure to light, ventilation and disturbances may vary in cages located at different height or on different racks, which may affect important physiological processes
Implication for the analysis
If blocking factors are used in the randomisation, they should also be included in the analysis. Nuisance variables increase variability in the sample, which reduces statistical power. Including a nuisance variable as a blocking factor in the analysis accounts for that variability and can increase the power, thus increasing the ability to detect a real effect with fewer experimental units. However, blocking uses up degrees of freedom and thus reduces the power if the nuisance variable does not have a substantial impact on variability.
Report the methods used to minimise confounding factors alongside the methods used to allocate animals to groups.
Examples
“Randomisation was carried out as follows. On arrival from El-Nile Company, animals were assigned a group designation and weighed. A total number of 32 animals were divided into four different weight groups (eight animals per group). Each animal was assigned a temporary random number within the weight range group. On the basis of their position on the rack, cages were given a numerical designation. For each group, a cage was selected randomly from the pool of all cages. Two animals were removed from each weight range group and given their permanent numerical designation in the cages. Then, the cages were randomized within the exposure group.” [71]
“…test time was between 08.30am to 12.30pm and testing order was randomized daily, with each animal tested at a different time each test day.” [72]
“Bulls were blocked by BW into four blocks of 905 animals with similar BW and then within each block, bulls were randomly assigned to one of four experimental treatments in a completely randomized block design resulting in 905 animals per treatment. Animals were allocated to 20 pens (181 animals per pen and five pens per treatment).” [73]
Item 5. Blinding
Describe who was aware of the group allocation at the different stages of the experiment (during the allocation, the conduct of the experiment, the outcome assessment, and the data analysis).
Explanation
Researchers often expect a particular outcome, and can unintentionally influence the experiment or interpret the data in such a way as to support their preferred hypothesis [74]. Blinding is a strategy used to minimise these subjective biases.
Whilst there is primary evidence of the impact of blinding in the clinical literature that directly compares blinded vs unblinded assessment of outcomes [75], there is limited empirical evidence in animal research [76, 77]. There are, however, compelling data from systematic reviews showing that non-blinded outcome assessment leads to the treatment effects being overestimated, and the lack of bias-reducing measures such as randomisation and blinding can contribute to as much as 30-45% inflation of effect sizes [60, 78, 79].
Ideally, investigators should be unaware of the treatment(s) animals have received or will be receiving, from the start of the experiment until the data have been analysed. If this is not possible for all the stages of an experiment (see Box 5), it should always be possible to conduct at least some of the stages blind. This has implications for the organisation of the experiment and may require help from additional personnel, for example a surgeon to perform interventions, a technician to code the treatment syringes for each animal, or a colleague to code the treatment groups for the analysis. Online resources are available to facilitate allocation concealment and blinding [18].
Blinding during different stages of an experiment
During allocation
Allocation concealment refers to concealing the treatment to be allocated to each individual animal from those assigning the animals to groups, until the time of assignment. Together with randomisation, allocation concealment helps minimise selection bias, which can introduce systematic differences between treatment groups.
During the conduct of the experiment
Where possible, animal care staff and those who administer treatments should be unaware of allocation groups to ensure that all animals in the experiment are handled, monitored and treated in the same way. Treating different groups differently based on the treatment they have received could alter animal behaviour and physiology, and produce confounds.
Welfare or safety reasons may prevent blinding of animal care staff but in most cases, blinding is possible. For example, if hazardous microorganisms are used, control animals can be considered as dangerous as infected animals. If a welfare issue would only be tolerated for a short time in treated but not control animals, a harm-benefit analysis is needed to decide whether blinding should be used.
During the outcome assessment
The person collecting experimental measurements or conducting assessments should not know which treatment each sample/animal received, and which samples/animals are grouped together. Blinding is especially important during outcome assessment, particularly if there is a subjective element (e.g. when assessing behavioural changes or reading histological slides) [76]. Randomising the order of examination can also reduce bias.
If the person assessing the outcome cannot be blinded to the group allocation (e.g. obvious phenotypic or behavioural differences between groups) some, but not all, of the sources of bias could be mitigated by sending data for analysis to a third party, who has no vested interest in the experiment and does not know whether a treatment is expected to improve or worsen the outcome.
During the data analysis
The person analysing the data should know which data are grouped together to enable group comparisons, but should not be aware of which treatment each group received. This type of blinding is often neglected, but is important as the analyst makes many semi-subjective decisions such as applying data transformation to outcome measures, choosing methods for handling missing data and handling outliers. How these decisions will be made should also be decided a priori.
Data can be coded prior to analysis so that the treatment group cannot be identified before analysis is completed.
Specify whether blinding was used or not for each step of the experimental process (see Box 5). If blinding was not possible during a specific stage of the experiment, provide the reason why.
Examples
“For each animal, four different investigators were involved as follows: a first investigator (RB) administered the treatment based on the randomization table. This investigator was the only person aware of the treatment group allocation. A second investigator (SC) was responsible for the anaesthetic procedure, whereas a third investigator (MS, PG, IT) performed the surgical procedure. Finally, a fourth investigator (MAD) (also unaware of treatment) assessed GCPS and NRS, MNT, and sedation NRS scores.” [80]
“…due to overt behavioral seizure activity the experimenter could not be blinded to whether the animal was injected with pilocarpine or with saline.” [81]
“Investigators could not be blinded to the mouse strain due to the difference in coat colors, but the three-chamber sociability test was performed with ANY-maze video tracking software (Stoelting, Wood Dale, IL, USA) using an overhead video camera system to automate behavioral testing and provide unbiased data analyses. The one-chamber social interaction test requires manual scoring and was analyzed by an individual with no knowledge of the questions.” [82]
Item 6. Outcome measures
6a. Clearly define all outcome measures assessed (e.g. cell death, molecular markers, or behavioural changes).
Explanation
An outcome measure (also known as a dependent variable or a response variable) is any variable recorded during a study (e.g. volume of damaged tissue, number of dead cells, specific molecular marker) to assess the effects of a treatment or experimental intervention. Outcome measures may be important for characterising a sample (e.g. baseline data) or for describing complex responses (e.g. ‘haemodynamic’ outcome measures including heart rate, blood pressure, central venous pressure, and cardiac output).
Explicitly describe what was measured, especially when measures can be operationalised in different ways. For example, activity could be recorded as time spent moving or distance travelled. Where possible, the recording of outcome measures should be made in an unbiased manner (e.g. blinded to the treatment allocation of each experimental group; see item 5 – Blinding). Specify how the outcome measure(s) assessed are relevant to the objectives of the study.
Example
“The following parameters were assessed: threshold pressure (TP; intravesical pressure immediately before micturition); post-void pressure (PVP; intravesical pressure immediately after micturition); peak pressure (PP; highest intravesical pressure during micturition); capacity (CP; volume of saline needed to induce the first micturition); compliance (CO; CP to TP ratio); frequency of voiding contractions (VC) and frequency of non-voiding contractions (NVCs).” [83]
6b. For hypothesis-testing studies, specify the primary outcome measure, i.e. the outcome measure that was used to determine the sample size.
Explanation
In a hypothesis-testing experiment, the primary outcome measure answers the main biological question. It is the outcome of greatest importance, identified in the planning stages of the experiment and used as the basis for the sample size calculation (see Box 3). For exploratory studies it is not necessary to identify a single primary outcome and often multiple outcomes are assessed (see item 13 – Objectives).
In a hypothesis-testing study powered to detect an effect on the primary outcome measure, data on secondary outcomes are used to evaluate additional effects of the intervention but subsequent statistical analysis of secondary outcome measures may be underpowered, making results and interpretation less reliable [84, 85]. Studies that claim to test a hypothesis but do not specify a pre-defined primary outcome measure, or those that change the primary outcome measure after data were collected (also known as primary outcome switching) are liable to selectively report only statistically significant results, favouring more positive findings [86].
Registering a protocol in advance protects the researcher against concerns about selective outcome reporting (also known as data dredging or p-hacking) and provides evidence that the primary outcome reported in the manuscript accurately reflects what was planned [87] (see item 19 – Protocol registration).
If the study was designed to test a hypothesis and more than one outcome was assessed, explicitly identify the primary outcome measure and state if it was defined as such prior to data collection.
Examples
“The primary outcome of this study will be forelimb function assessed with the staircase test. Secondary outcomes constitute Rotarod performance, stroke volume (quantified on MR imaging or brain sections, respectively), diffusion tensor imaging (DTI) connectome mapping, and histological analyses to measure neuronal and microglial densities, and phagocytic activity… The study is designed with 80% power to detect a relative 25% difference in pellet-reaching performance in the Staircase test.” [88]
“The primary endpoint of this study was defined as left ventricular ejection fraction (EF) at the end of follow-up, measured by magnetic resonance imaging (MRI). Secondary endpoints were left ventricular end diastolic volume and left ventricular end systolic volume (EDV and ESV) measured by MRI, infarct size measured by ex vivo gross macroscopy after incubation with triphenyltetrazolium chloride (TTC) and late gadolinium enhancement (LGE) MRI, functional parameters serially measured by pressure volume (PV-)loop and echocardiography, coronary microvascular function by intracoronary pressure- and flow measurements and vascular density and fibrosis on histology. Based on a power calculation (estimated effect 7.5% [6], standard deviation of 5%, a power of 0.9 and alpha of 0.05) 8 pigs per group were needed.” [66]
Item 7. Statistical methods
7a. Provide details of the statistical methods used for each analysis.
Explanation
In hypothesis-testing studies comparing two or more groups, inferential statistics are used to estimate the size of the effect and to determine the weight of evidence against the null hypothesis. The effect size is the magnitude of the difference between two groups. The description of the statistical analysis should provide enough detail so that another researcher could re-analyse the raw data using the same method and obtain the same results. Relevant information includes what the outcome measures and independent variables were, what statistical analyses were performed, what tests were used to check assumptions, and any data transformations [89]. Give details of any confounders, blocking factors or covariates taken into account for each statistical test, include how the effects of each were mitigated. This allows readers to assess if analysis methods were appropriate.
In exploratory studies where no specific hypothesis was tested, descriptive statistics can be used to summarise the data (see item 10 – Results). They do not allow conclusions beyond the data but are important for generating new hypotheses that may be tested in subsequent experiments.
For any study reporting descriptive statistics, explicitly state which measure of central tendency is reported (e.g. mean or median) and which measure of variability is reported (e.g. standard deviation, range, quartiles or interquartile range).
Examples
“Analysis of variance was performed using the GLM procedure of SAS (SAS Inst., Cary, NC). Average pen values were used as the experimental unit for the performance parameters. The model considered the effects of block and dietary treatment (5 diets). Data were adjusted by the covariant of initial body weight. Orthogonal contrasts were used to test the effects of SDPP processing (UV vs no UV) and dietary SDPP level (3% vs 6%). Results are presented as least squares means. The level of significance was set at P < 0.05.” [90]
“All risk factors of interest were investigated in a single model. Logistic regression allows blocking factors and explicitly investigates the effect of each independent variable controlling for the effects of all others…. As we were interested in husbandry and environmental effects, we blocked the analysis by important biological variables (age; backstrain; inbreeding; sex; breeding status) to control for their effect. (The role of these biological variables in barbering behavior, particularly with reference to barbering as a model for the human disorder trichotillomania, is described elsewhere: Garner et al., 2004). We also blocked by room to control for the effect of unknown environmental variables associated with this design variable. We tested for the effect of the following husbandry and environmental risk factors: cage mate relationships (i.e. siblings, non-siblings, or mixed); cage type (i.e. plastic or steel); cage height from floor; cage horizontal position (whether the cage was on the side or the middle of a rack); stocking density; and the number of adults in the cage. Cage material by cage height from floor; and cage material by cage horizontal position interactions were examined, and then removed from the model as they were nonsignificant. N = 1959 mice were included in this analysis” [91]
7b. Specify the experimental unit that was used for each statistical test.
Explanation
Incorrect identification of the experimental unit can lead to pseudoreplication and underpowered studies (see item 1 – Study design). For example, measurements from 50 individual cells from a single mouse represent N = 1 when the experimental unit is the mouse. The 50 measurements are subsamples and provide an estimate of measurement error so should be averaged or used in a nested analysis. Reporting N = 50 in this case is an example of pseudoreplication [22]. It underestimates the true variability in a study, which can lead to false positives. If, however, each cell taken from the mouse is then randomly allocated to different treatments and assessed individually, the cell might be regarded as the experimental unit.
Explicitly report the experimental unit used in each statistical analysis.
Examples
“For each test, the experimental unit was an individual animal.”[92]
“Maternal data regarding body weight, food intake, water consumption, urinary cotinine level and hormonal analysis were analyzed using the individual animal as the experimental unit. The data for offspring regarding body weight, food intake, organ weight at necropsy, urinary cotinine level, immunohistochemical cellular distribution, TUNEL+ cells, RT-PCR and hormonal analysis were analyzed using the litter as the experimental unit.” [93]
7c. Describe any methods used to assess whether the data met the assumptions of the statistical approach.
Explanation
Hypothesis tests are based on assumptions about the underlying data. Describing how assumptions were assessed, and whether these assumptions are met by the data, enables readers to assess the suitability of the statistical approach used. If the assumptions are incorrect, the conclusions may not be valid. For example, the assumptions for data used in parametric tests (such as a t-test, Z-test, ANOVA, Pearson’s r coefficient, etc.) are that the data are continuous, the residuals from the analysis are normally distributed, the responses are independent, and that different groups should have similar variances.
There are various tests for normality, for example the Shapiro-Wilk and Kolmogorov-Smirnov tests. However, these tests have to be used cautiously. If the sample size is small, they will struggle to detect non-normality, if the sample size is large, the tests will detect minor deviations. An alternative approach is to evaluate data with visual plots e.g. normal probability plots, box plots, scatterplots. If the residuals of the analysis are not normally distributed, the assumption may be satisfied using a data transformation where the same mathematical function is applied to all data points to produce normally distributed data (e.g. loge, log10, square root, arcsine).
Other types of outcome measures (binary, categorical, or ordinal) will require different methods of analysis, and each will have different sets of assumptions. For example, categorical data are summarised by counts and percentages or proportions, and are analysed by tests of proportions; these analysis methods assume that data are binary, ordinal or nominal, and independent [94].
Report the type of outcome measure and the methods used to test the assumptions of the statistical approach. If data were transformed, identify precisely the transformation used and which outcome measures it was applied to.
Examples
“Model assumptions were checked using the Shapiro-Wilk normality test and Levene’s Test for homogeneity of variance and by visual inspection of residual and fitted value plots. Some of the response variables had to be transformed by applying the natural logarithm or the second or third root, but were back-transformed for visualization of significant effects.” [95]
The effects of housing (treatment) and day of euthanasia on cortisol levels were assessed by using fixed-effects 2-way ANOVA. An initial exploratory analysis indicated that groups with higher average cortisol levels also had greater variation in this response variable. To make the variation more uniform, we used a logarithmic transform of each fish’s cortisol per unit weight as the dependent variable in our analyses. This action made the assumptions of normality and homoscedasticity (standard deviations were equal) of our analyses reasonable. [96]
Item 8. Experimental animals
8a. Provide details of the animals used, including species, strain and substrain, sex, age or developmental stage, and weight.
Explanation
The species, strain, substrain, sex, weight, and age of animals are critical factors that can influence most experimental results [97–101]. Reporting the characteristics of all animals used is equivalent to standardised human patient demographic data; these data support both the internal and external validity of the study results. It enables other researchers to repeat the experiment and generalise the findings. It also enables readers to assess whether the animal characteristics chosen for the experiment are relevant to the research objectives.
Report age and weight for each group, include summary statistics (e.g. mean and standard deviation) and, if possible, baseline values for individual animals (e.g. as supplementary information or a link to a publicly accessible data repository). For most species, precise reporting of age is more informative than a description of the developmental status (e.g. a mouse referred to as an adult can vary in age from six to 20 weeks [102]). In some cases, however, reporting the developmental stage is more informative than chronological age, for example in juvenile Xenopus, where rate of development can be manipulated by incubation temperature [103].
Example
“One hundred and nineteen male mice were used: C57BL/6OlaHsd mice (n = 59), and BALB/c OlaHsd mice (n = 60) (both from Harlan, Horst, The Netherlands). At the time of the EPM test the mice were 13 weeks old and had body weights of 27.4 ± 0.4 g and 27.8 ± 0.3 g, respectively (mean ± SEM).” [104]
“Histone Methylation Profiles and the Transcriptome of X. tropicalis Gastrula Embryos. To generate epigenetic profiles, ChIP was performed using specific antibodies against trimethylated H3K4 and H3K27 in Xenopus gastrula-stage embryos (Nieuwkoop-Faber stage 11–12), followed by deep sequencing (ChIP-seq). In addition, polyA-selected RNA (stages 10–13) was reverse transcribed and sequenced (RNA-seq).” [105]
8b. Provide further relevant information on the provenance of animals, health/immune status, genetic modification status, genotype, and any previous procedures.
Explanation
The animals’ provenance, their health or immune status and their history of previous testing or procedures, can influence their physiology and behaviour as well as their response to treatments, and thus impact on study outcomes. For example, animals of the same strain, but from different sources, or animals obtained from the same source but at different times, may be genetically different [17]. The immune or microbiological status of the animals can also influence welfare, experimental variability and scientific outcomes [106–108].
Report the health status of animals in the study, and any previous procedures the animals have undergone. For genetically modified animals, describe the genetic modification status (e.g. knockout, overexpression), genotype (e.g. homozygous, heterozygous), manipulated gene(s), genetic methods and technologies used to generate the animals, how the genetic modification was confirmed, and details of animals used as controls (e.g. littermate controls [109]).
Reporting the correct nomenclature is crucial to understanding the data and ensuring that the research is discoverable and replicable [110–112]. Useful resources for reporting nomenclature for different species include:
Mice - International Committee on Standardized Genetic Nomenclature (https://www.jax.org/jax-mice-and-services/customer-support/technical-support/genetics-and-nomenclature)
Rats - Rat Genome and Nomenclature Committee (https://rgd.mcw.edu/)
Zebrafish - Zebrafish Information Network (http://zfin.org/)
Xenopus - Xenbase (http://www.xenbase.org/entry/)
Drosophila – FlyBase (http://flybase.org/)
Examples
“A construct was engineered for knockin of the miR-128 (miR-128-3p) gene into the Rosa26 locus. Rosa26 genomic DNA fragments (~1.1 kb and ~4.3kb 5’ and 3’ homology arms, respectively) were amplified from C57BL/6 BAC DNA, cloned into the pBasicLNeoL vector sequentially by in-fusion cloning, and confirmed by sequencing. The miR-128 gene, under the control of tetO-minimum promoter, was also cloned into the vector between the two homology arms. In addition, the targeting construct also contained a loxP sites flanking the neomycin resistance gene cassette for positive selection and a diphtheria toxin A (DTA) cassette for negative selection. The construct was linearized with ClaI and electroporated into C57BL/6N ES cells. After G418 selection, seven-positive clones were identified from 121 G418-resistant clones by PCR screening. Six-positive clones were expanded and further analyzed by Southern blot analysis, among which four clones were confirmed with correct targeting with single-copy integration. Correctly targeted ES cell clones were injected into blastocysts, and the blastocysts were implanted into pseudo-pregnant mice to generate chimeras by Cyagen Biosciences Inc. Chimeric males were bred with Cre deleted mice from Jackson Laboratories to generate neomycin-free knockin mice. The correct insertion of the miR-128 cassette and successful removal of the neomycin cassette were confirmed by PCR analysis with the primers listed in Supplementary Table 1.” [113]
“The C57BL/6J (Jackson) mice were supplied by Charles River Laboratories. The C57BL/6JOlaHsd (Harlan) mice were supplied by Harlan. The α-synuclein knockout mice were kindly supplied by Prof. [X] (Cardiff University, Cardiff, United Kingdom.) and were congenic C57BL/6JCrl (backcrossed for 12 generations). TNFα−/− mice were kindly supplied by Dr. [Y] (Queens University, Belfast, Northern Ireland) and were inbred on a homozygous C57BL/6J strain originally sourced from Bantin & Kingman and generated by targeting C57BL/6 ES cells. T286A mice were obtained from Prof. [Z] (University of California, Los Angeles, CA). These mice were originally congenic C57BL/6J (backcrossed for five generations) and were then inbred (cousin matings) over 14 y, during which time they were outbred with C57BL/6JOlaHsd mice on three separate occasions.” [114]
Item 9. Experimental procedures
For each experimental group, including controls, describe the procedures in enough detail to allow others to replicate them, including:
9a. What was done, how it was done and what was used.
Explanation
Essential information to describe in the manuscript includes the procedures used to develop the model (e.g. induction of the pathology), the procedures used to measure the outcomes, and pre- and post-experimental procedures, including animal handling and welfare monitoring. Animal handling can be a source of stress and the specific method used (e.g. mice picked up by tail or in cupped hands) can affect research outcomes [56, 115, 116]. Details about animal care and monitoring intrinsic to the procedure are discussed in further detail in item 16 – Animal care and monitoring. Provide enough detail to enable others to replicate the methods and highlight any quality assurance and quality control used [117, 118]. A schematic of the experimental procedures with a timeline can give a clear overview of how the study was conducted. Information relevant to distinct types of interventions and resources are described in Box 6.
Examples of information to include when reporting specific types experimental procedures and resources
Where available, cite the Research Resource Identifier (RRID) for reagents and tools used [120, 121]. RRIDs are unique and stable, allowing unambiguous identification of reagents or tools used in a study, aiding other researchers to replicate the methods.
Detailed step-by-step procedures can also be saved and shared online, for example using Protocols.io [122], which assigns a DOI to the protocol and allows cross-referencing between protocols and publications.
Examples
“Fig… shows the timeline for instrumentation, stabilization, shock/injury, and resuscitation…. Animals were food-deprived for 18 hours before surgery, but allowed free access to water. On the morning of surgery, swine were sedated with tiletamine-zolazepam (Telazol®; 5-8 mg/kg IM; Zoetis Inc., Kalamazoo MI) in the holding pen, weighed, and masked with isoflurane (3%, balance 100% O2) for transport to the lab. The marginal ear vein was catheterized for administration of atropine (0.02 mg/kg IV; Sparhawk Laboratories, Lenexa KS), and buprenorphine for pre-emptive analgesia (3 mg/ml IV; ZooPharm, Laramie WY). Ophthalmic ointment (Puralube®; Fera Pharmaceuticals) was applied to prevent corneal drying. Animals were intubated in dorsal recumbency with a cuffed 6 or 7 Fr endotracheal tube. Anesthetic plane was maintained by isoflurane (1-1.5%; 21-23 % O2, balance N2). Oxygen saturation (sPO2, %) and heart rate (HR) were monitored with a veterinary pulse oximeter placed in the buccal cavity (Masimo SET Radical-7; Irvine CA). Core temperature was monitored with a rectal probe and maintained at 36.5-38°C with a microprocessor-controlled feedback water blanket (Blanketrol® II, Cincinnati Sub-Zero (CSZ) Cincinnati, OH) placed under the animal. Anesthetic depth was assessed every 5 min for the duration of the experiment by reflexes (corneal touch, pedal flexion, coronary band pinch) and vital signs (sPO2, HR, core temperature).
Fig…. Experimental time line for instrumentation, shock/injury, resuscitation, post-resuscitation monitoring, and blood sampling in a swine model of polytrauma and coagulopathy.” [123]
“For the diet-induced obesity (DIO) model, eight-week-old male mice had ad libitum access to drinking water and were kept on standard chow (SFD, 10.9 kJ/g) or on western high-fat diet (HFD; 22 kJ/g; kcal from 42% fat, 43% from carbohydrates and 15% from protein; E15721-34, Ssniff, Soest, Germany) for 15 weeks (http://dx.doi.org/10.17504/protocols.io.kbacsie).” [124]
9b. When and how often.
Explanation
Clearly report the frequency and timing of experimental procedures and measurements, including the light and dark cycle (e.g. 12:12), circadian time cues (e.g. lights on at 08:00), and experimental time sequence (e.g. interval between baseline and comparator measurements or interval between procedures and measurements). Along with innate circadian rhythms, these can affect research outcomes such as behavioural, physiological, and immunological parameters [125, 126]. Also report the timing and frequency of welfare assessments, taking into consideration the normal activity patterns (see item 16 – Animal care and monitoring). For example, nocturnal animals may not show behavioural signs of discomfort during the day [127].
Examples
“Blood pressure, heart rate, oxygen saturation and amount of blood extracted were recorded every 5 minutes. Blood samples were drawn at baseline (pre injury), 0 minutes (immediately after injury), and after 30 and 60 minutes.” [128]
“After a 5-h fast (7:30–12:30am), awake and freely moving mice were randomized and subjected to three consecutive clamps performed in the same mice as described above, with a 2 days recovery after each hyperinsulinemic/hypoglycemic (mHypo, n = 6) or hyperinsulinemic/euglycemic (mEugly, n = 4) clamps.” [129]
9c. Where (including detail of any acclimation periods).
Explanation
Physiological acclimation after a stressful event, such as transport (e.g. between supplier, animal facility and laboratory), but before the experiment begins allows stabilisation of physiological responses of the animal [130, 131]. Protocols vary depending on species, strain, and outcome; for example physiological acclimation following transportation of different animals can take anywhere from 24 hours to more than one week [132]. Procedural acclimation, immediately before a procedure, allows stabilisation of the animals’ responses after unaccustomed handling, novel environments, and previous procedures, which otherwise can induce behavioural and physiological changes [133, 134].
Indicate where studies were performed (e.g. dedicated laboratory space or animal facility, home cage, open field arena, water maze) and whether periods of physiological or procedural acclimation were included in the study protocol, including type and duration. If the study involved multiple sites explicitly state where each experiment and sample analysis was performed. Include any accreditation of laboratories if appropriate (e.g. if samples are sent to a commercial laboratory for analysis).
Example
“Fish were singly housed for 1 week before being habituated to the conditioning tank over 2 consecutive days. The conditioning tank consisted of an opaque tank measuring 20 cm (w) 15 cm (h) 30 cm (l) containing 2.5 l of aquarium water with distinct visual cues (spots or stripes) on walls at each end of the tank… During habituation, each individual fish was placed in the conditioning apparatus for 20 minutes with free access to both compartments and then returned to its home tank.” [135]
9d. Why (provide rationale for procedures).
Explanation
There may be numerous approaches to investigate any given research problem, therefore it is important to explain why a particular procedure or technique was chosen. This is especially relevant when procedures are novel or specific to a research laboratory, or constrained by the animal model or experimental equipment (e.g. route of administration determined by animal size [136]).
Examples
“Because of the very small caliber of the murine tail veins, partial paravenous injection is common if 18F-FDG is administered by tail vein injection (intravenous). This could have significantly biased our comparison of the biodistribution of 18F-FDG under various conditions. Therefore, we used intraperitoneal injection of 18F-FDG for our experiments evaluating the influence of animal handling on 18F-FDG biodistribution.” [137]
“Since Xenopus oocytes have a higher potential for homologous recombination than fertilized embryos (Hagmann et al., 1996), we next tested whether the host transfer method could be used for efficient HDR-mediated knock-in. We targeted the C-terminus of X. laevis Ctnnb1 (β-catenin), a key cytoskeletal protein and effector of the canonical Wnt pathway, because previous studies have shown that addition of epitope tags to the C-terminus do not affect the function of the resulting fusion protein (Fig. 2A) (Evans et al., 2010; Miller and Moon, 1997). CRISPR components were injected into X. laevis oocytes followed by host transfer or into embryos.” [138]
Item 10. Results
For each experiment conducted, including independent replications, report:
10a. Summary/descriptive statistics for each experimental group, with a measure of variability where applicable.
Explanation
Summary/descriptive statistics provide a quick and simple description of the data, they communicate quantitative results easily and facilitate visual presentation. For continuous data, these descriptors include a measure of central tendency (e.g. mean, median) and a measure of variability (e.g. quartiles, range, standard deviation) to help readers assess the precision of the data collected. Categorical data can be expressed as counts, frequencies, or proportions.
Report data for all experiments conducted. If a complete experiment is repeated on a different day, or under different conditions, report the results of all repeats, rather than selecting data from representative experiments. Report the exact number of experimental units per group so readers can gauge reliability of results (see item 2 – Sample size, and item 3 – Inclusion and exclusion criteria). Present data clearly as text, in tables, or in graphs, to enable information to be evaluated, or extracted for future meta-analyses. Report descriptive statistics with a clearly identified measure of variability for each group. Example 1 shows data summarised as medians and, in brackets, lower and upper quartiles. Boxplots are a convenient way to summarise continuous data, plotted as median, interquartile range, minimum, maximum and outliers, as shown in Example 2.
Examples
“Features of autism compared between the control group and the autistic model groups” [139]
“Boxplots of median heart rate (beats per minute, bpm) during rs-fMRI scans for Y (n = 12), AU (n = 12), and AI (n = 12) animals.”[140]
10b. If applicable, the effect size with a confidence interval.
Explanation
An effect size is a quantitative measure that estimates the magnitude of differences between groups, or relationships between variables. It can be used to assess the patterns in the data collected and make inferences about the wider population from which the sample came. The confidence interval for the effect indicates how precisely the effect has been estimated, and tells the reader about the strength of the effect [141]. For example, if zero is included in the 95% confidence interval, the presence of an effect cannot be assumed. In studies where statistical power is low, and/or hypothesis-testing is inappropriate, providing the effect size and confidence interval indicates how small or large an effect might really be, so a reader can judge the biological significance of the data [142, 143]. Reporting effect sizes with confidence intervals also facilitates extraction of useful data for systematic review and meta-analysis. Where multiple independent studies included in a meta-analysis show quantitatively similar effects, even if each is statistically non-significant, this provides powerful evidence that a relationship is ‘real’, although small.
Report all analyses performed, even those providing non-statistically significant results. Report the effect size, to indicate the size of the difference between groups in the study, with a confidence interval, to indicate the precision of the effect size estimate.
Example
“For all traits identified as having a significant genotype effect for the Usp47tm1b(EUCOMM)Wtsi line (MGI:5605792), a comparison is presented of the standardized genotype effect with 95% confidence interval for each sex with no multiple comparisons correction. Standardization, to allow comparison across variables, was achieved by dividing the genotype estimate by the signal seen in the wildtype population. Shown in red are statistically significant estimates. RBC: red blood cells; BMC: bone mineral content; BMD: bone mineral density; WBC: white blood cells.” [28]
2. Recommended Set
The Recommended Set (Box 7) adds context to the study described, including further detail about the methodology and advice on what to include in the more narrative parts of a manuscript. Items are presented in a logical order, there is no ranking within the set.
ARRIVE Recommended Set
11. Abstract
12. Background
13. Objectives
14. Ethical statement
15. Housing and husbandry
16. Animal care and monitoring
17. Interpretation /scientific implications
18. Generalisability /translation
19. Protocol registration
20. Data access
21. Declaration of interests
Item 11. Abstract
Provide an accurate summary of the research objectives, animal species, strain and sex, key methods, principal findings, and study conclusions.
Explanation
A transparent and accurate abstract increases the utility and impact of the manuscript, and allows readers to assess the reliability of the study [144]. The abstract is often used as a screening tool by readers to decide whether to read the full article or to select an article for inclusion in a systematic review. However, abstracts often either do not contain enough information for this purpose [145, 146], or contain information that is inconsistent with the results in the rest of the manuscript [147, 148].
To maximise utility, include details of the species, sex and strain of animals used, and accurately report the methods, results and conclusions of the study. Also describe the objectives of the study, including whether it was designed to either test a specific hypothesis or to generate a new hypothesis (see Item 13 – Objectives). Incorporating this information will enable readers to interpret the strength of evidence, and judge how the study fits within the wider knowledge base.
Examples
“BACKGROUND AND PURPOSE: Asthma is an inflammatory disease that involves airway hyperresponsiveness and remodelling. Flavonoids have been associated to anti-inflammatory and antioxidant activities and may represent a potential therapeutic treatment of asthma. Our aim was to evaluate the effects of the sakuranetin treatment in several aspects of experimental asthma model in mice.
EXPERIMENTAL APPROACH: Male BALB/c mice received ovalbumin (i.p.) on days 0 and 14, and were challenged with aerolized ovalbumin 1% on days 24, 26 and 28. Ovalbumin-sensitized animals received vehicle (saline and dimethyl sulfoxide, DMSO), sakuranetin (20 mg kg–1per mice) or dexamethasone (5 mg kg–1 per mice) daily beginning from 24th to29th day. Control group received saline inhalation and nasal drop vehicle. On day 29, we determined the airway hyperresponsiveness, inflammation and remodelling as well as specific IgE antibody. RANTES, IL-5, IL-4, Eotaxin, IL-10, TNF-a, IFN-g and GMC-SF content in lung homogenate was performed by Bioplex assay, and 8-isoprostane and NF-kB activations were visualized in inflammatory cells by immunohistochemistry.
KEY RESULTS: We have demonstrated that sakuranetin treatment attenuated airway hyperresponsiveness, inflammation and remodelling; and these effects could be attributed to Th2 pro-inflammatory cytokines and oxidative stress reduction as well as control of NF-kB activation.
CONCLUSIONS AND IMPLICATIONS: These results highlighted the importance of counteracting oxidative stress by flavonoids in this asthma model and suggest sakuranetin as a potential candidate for studies of treatment of asthma.” [149]
“In some parts of the world, the laboratory pig (Sus scrofa) is often housed in individual, sterile housing which may impose stress. Our objectives were to determine the effects of isolation and enrichment on pigs housed within the PigTurn® — a novel penning system with automated blood sampling — and to investigate tear staining as a novel welfare indicator. Twenty Yorkshire × Landrace weaner pigs were randomly assigned to one of four treatments in a 2 × 2 factorial combination of enrichment (non-enriched [NE] or enriched [E]) and isolation (visually isolated [I] or able to see another pig [NI]). Pigs were catheterised and placed into the PigTurns® 48 h post recovery. Blood was collected automatically twice daily to determine white blood cell (WBC) differential counts and assayed for cortisol. Photographs of the eyes were taken daily and tear staining was quantified using a 0–5 scoring scale and Image-J software to measure stain area and perimeter. Behaviour was video recorded and scan sampled to determine time budgets. Data were analysed as an REML using the MIXED procedure of SAS. Enrichment tended to increase proportion of time standing and lying laterally and decrease plasma cortisol, tear-stain area and perimeter. There was a significant isolation by enrichment interaction. Enrichment given to pigs housed in isolation had no effect on plasma cortisol, but greatly reduced it in non-isolated pigs. Tear-staining area and perimeter were highest in the NE-I treatment compared to the other three treatments. Eosinophil count was highest in the E-NI treatment and lowest in the NE-I treatment. The results suggest that in the absence of enrichment, being able to see another animal but not interact may be frustrating. The combination of no enrichment and isolation maximally impacted tear staining and eosinophil numbers. However, appropriate enrichment coupled with proximity of another pig would appear to improve welfare.” [150]
Item 12. Background
12a. Include sufficient scientific background to understand the rationale and context for the study, and explain the experimental approach.
Explanation
Scientific background information for an animal study should demonstrate a clear evidence gap and explain why an in vivo approach was warranted. Systematic reviews of the animal literature provide the most convincing evidence that a research question has not been conclusively addressed, by showing the extent of current evidence within a field of research. They can also inform the choice of animal model by providing a comprehensive overview of the models used along with their benefits and limitations [151, 152].
Describe the rationale and context of the study and how it relates to other research, including relevant references to previous work. Outline evidence underpinning the hypothesis or objectives and explain why the experimental approach is best suited to answer the research question.
Example
“For decades, cardiovascular disease has remained the leading cause of mortality worldwide…[and] cardiovascular research has been performed using healthy and young, non-diseased animal models. Recent failures of cardioprotective therapies in obese insulin-resistant, diabetic, metabolic syndrome-affected and aged animals that were otherwise successful in healthy animal models has highlighted the need for the development of animal models of disease that are representative of human clinical conditions… In the clinical setting, elderly male patients often present with both testosterone deficiency (TD) and the metabolic syndrome (MetS). A strong and compounding association exists between MetS and TD which may have significant impact on cardiovascular disease and its outcomes which is not addressed by current models…. their mutual presentation in the clinical setting warrants the development of appropriate animal models of the MetS with hypogonadism, especially in the context of cardiovascular disease research.” [153]
12b. Explain how the animal species and model used address the scientific objectives and, where appropriate, the relevance to human biology.
Explanation
Provide enough detail for the reader to assess the suitability of the animal model used to address the research question. Include information on the rationale for choosing a particular species, explain how the outcome measures assessed are relevant to the condition under study, and how the model was validated. Stating that an animal model is commonly used in the field is not appropriate, and a well-considered, detailed rationale should be provided.
When the study models a human disease, indicate how the model is appropriate for addressing the specific objectives of the study [154]. This can include a description of how the induction of the disease, disorder, or injury is sufficiently analogous to the human condition, how the model responds to known clinically-effective treatments, how similar symptoms are to the clinical disease and how animal characteristics were selected to represent the age, sex, and health status of the clinical population [14].
Examples
“… we selected a pilocarpine model of epilepsy that is characterized by robust, frequent spontaneous seizures acquired after a brain insult, well-described behavioral abnormalities, and poor responses to antiepileptic drugs. These animals recapitulate several key features of human temporal lobe epilepsy, the most common type of epilepsy in adults.” [155]
“Transplantation of healthy haematopoietic stem cells (HSCs) is a critical therapy for a wide range of malignant haematological and non-malignant disorders and immune dysfunction (Snowden et al., 2012; Sykes & Nikolic, 2005; Thomas et al., 1957)…. Zebrafish are already established as a successful model to study the haematopoietic system, with significant homology with mammals (de Jong & Zon, 2005; Gering & Patient, 2005; Kissa & Herbomel, 2010; Renshaw & Trede, 2012; Traver et al., 2003; White et al., 2008). Imaging of zebrafish transparent embryos remains a powerful tool and has been critical to confirm that the zebrafish Caudal Haematopoietic Tissue (CHT) is comparable to the mammalian foetal haematopoietic niche (Gering & Patient, 2005; Kissa & Herbomel, 2010; Tamplin et al., 2015). Xenotransplantation in zebrafish embryos has revealed highly conserved mechanisms between zebrafish and mammals. Recently, murine bone marrow cells were successfully transplanted into zebrafish embryos, revealing highly conserved mechanism of haematopoiesis between zebrafish and mammals (Parada-Kusz et al., 2017). Additionally, CD34 enriched human cells transplanted into zebrafish were shown to home to the CHT and respond to zebrafish stromal-cell derived factors (Staal et al., 2016).” [156]
Item 13. Objectives
Clearly describe the research question, research objectives and, where appropriate, specific hypotheses being tested.
Explanation
Explaining the purpose of the study by describing the question(s) that the research addresses, allows readers to determine if the study is relevant to them. Readers can also assess the relevance of the model organism, procedures, outcomes measured, and analysis used.
Knowing if a study is exploratory or hypothesis-testing is critical to its interpretation. A typical exploratory study may measure multiple outcomes and look for patterns in the data, or relationships that can be used to generate hypotheses. It may also be a pilot study which aims to inform the design or feasibility of larger subsequent experiments. Exploratory research helps researchers to design hypothesis-testing experiments, by choosing what variables or outcome measures to focus on in subsequent studies.
Testing a specific hypothesis has implications for both the study design and the data analysis [17, 157]. For example, an experiment designed to detect a hypothesised effect will likely need to be analysed with inferential statistics, and a statistical estimation of the sample size will need to be performed a priori (see item 2 – Sample size). Hypothesis-testing studies also have a pre-defined primary outcome measure, which is used to assess the evidence in support of the specific research question (see item 6 – Outcome measures).
In contrast, exploratory research investigates many possible effects, and is likely to yield more false positive results because some will be positive by chance; thus results from well-designed hypothesis-testing studies provide stronger evidence. Independent replication and meta-analysis can further increase the confidence in conclusions.
Clearly outline the objective(s) of the study, including whether it is hypothesis-testing or exploratory, or if it includes research of both types. Hypothesis-testing studies may collect additional information for exploratory purposes, it is important to distinguish which hypotheses were prespecified and which originated after data inspection, especially when reporting unanticipated effects or outcomes that were not part of the original study design.
Examples
“The primary objective of this study was to investigate the cellular immune response to MSC injected into the striatum of allogeneic recipients (6-hydroxydopamine [6-OHDA]-hemilesioned rats, an animal model of Parkinson’s disease [PD]), and the secondary objective was to determine the ability of these cells to prevent nigrostriatal dopamine depletion and associated motor deficits in these animals.” [158]
“In this exploratory study, we aimed to investigate whether calcium electroporation could initiate an anticancer immune response similar to electrochemotherapy. To this end, we treated immunocompetent balb/c mice with CT26 colon tumors with calcium electroporation, electrochemotherapy, or ultrasound-based delivery of calcium or bleomycin.” [159]
Item 14. Ethical statement
Provide the name of the ethical review committee or equivalent that has approved the use of animals in this study and any relevant licence or protocol numbers (if applicable). If ethical approval was not sought or granted, provide a justification.
Explanation
Authors are responsible for complying with regulations and guidelines relating to the use of animals for scientific purposes. This includes ensuring that they have the relevant approval for their study from an appropriate ethics committee and/or regulatory body before the work starts. The ethical statement provides editors, reviewers and readers with assurance that studies have received this ethical oversight [12]. This also promotes transparency and understanding about the use of animals in research and fosters public trust.
Provide a clear statement explaining how the study conforms to appropriate regulations and guidelines, and indicate relevant licence/protocol numbers so that the study can be identified. Add also any relevant accreditation e.g. AAALAC (American Association for Accreditation of Laboratory Animal Care) [160] or GLP (Good Laboratory Practice).
If the research is not covered by any regulation and formal ethical approval is not required (e.g. a study using animal species not protected by regulations or law), demonstrate that international standards were complied with and cite the appropriate reference. In such cases, provide a clear statement explaining why the research is exempt from regulatory approval.
Examples
“All procedures were conducted in accordance with the United Kingdom Animal (Scientific Procedures) Act 1986, approved by institutional ethical review committees (Alderley Park Animal Welfare and Ethical Review Board and Babraham Institute Animal Welfare and Ethical Review Board) and conducted under the authority of the Project Licence (40/3729 and 70/8307, respectively).” [161]
“All protocols in this study were approved by the Committee on the Ethics of Animal Experiments of Fuwai Hospital, Peking Union Medical College and the Beijing Council on Animal Care, Beijing, China (IACUC permit number: FW2010-101523), in compliance with the Guide for the Care and Use of Laboratory Animals published by the US National Institutes of Health (NIH publication no.85-23, revised 1996).” [162]
“Samples and data were collected according to Institut de Sélection d’Animale (ISA) protocols, under the supervision of ISA employees. Samples and data were collected as part of routine animal data collection in a commercial breeding program for layer chickens in The Netherlands. Samples and data were collected on a breeding nucleus of ISA for breeding purposes only, and is a non-experimental, agricultural practice, regulated by the Act Animals, and the Royal Decree on Procedures. The Dutch Experiments on Animals Act does not apply to non-experimental, agricultural practices. An ethical review by the Statement Animal Experiment Committee was therefore not required. No extra animal discomfort was caused for sample collection for the purpose of this study.” [163]
Item 15. Housing and husbandry
Provide details of housing and husbandry conditions, including any environmental enrichment.
Explanation
The environment determines the health and wellbeing of the animals and every aspect of it can potentially affect their behavioural and physiological responses, thereby affecting research outcomes. Different studies may be sensitive to different environmental factors, and particular aspects of the environment necessary to report may depend on the type of study. Examples of housing and husbandry conditions known to affect animal welfare and research outcomes are listed in Table 1; consider reporting these elements and any other housing and husbandry conditions likely to influence the study outcomes.
Environment, either deprived or enriched, can affect a wide range of physiological and behavioural responses [189]. Specific details to report include, but are not limited to, structural enrichment (e.g. elevated surfaces, dividers), resources for species-typical activities (e.g. nesting material, shelters, gnawing sticks), toys and or other tools used to stimulate exploration, exercise (e.g. running wheel), and novelty. If no environmental enrichment was provided, this should be clearly stated with justification. Similarly, scientific justification needs to be reported for withholding food and water [190], and for singly housing animals [191, 192].
Examples
“Breeding colonies were kept in individually ventilated cages (IVCs; Tecniplast, Italy) at a temperature of 20°C to 24°C, humidity of 50% to 60%, 60 air exchanges per hour in the cages, and a 12/12-hour light/dark cycle with the lights on at 5:30 AM. The maximum caging density was five mice from the same litter and sex starting from weaning. As bedding, spruce wood shavings (Lignocel FS-14; J. Rettenmaier und Soehne GmbH, Rosenberg, Germany) were provided. Mice were fed a standardized mouse diet (1314, Altromin, Germany) and provided drinking water ad libitum. All materials, including IVCs, lids, feeders, bottles, bedding, and water were autoclaved before use. Sentinel mice were negative for at least all Federation of laboratory animal science associations (FELASA)-relevant murine infectious agents [30] as diagnosed by our health monitoring laboratory, mfd Diagnostics GmbH, Wendelsheim, Germany.” [193]
“Same sex litter mates were housed together in individually ventilated cages with two or four mice per cage. All mice were maintained on a regular diurnal lighting cycle (12:12 light:dark) with ad libitum access to food (7012 Harlan Teklad LM-485 Mouse/Rat Sterilizable Diet) and water. Chopped corn cob was used as bedding. Environmental enrichment included nesting material (Nestlets, Ancare, Bellmore, NY, USA), PVC pipe, and shelter (Refuge XKA-2450-087, Ketchum Manufacturing Inc., Brockville, Ontario, Canada). Mice were housed under broken barrier-specific pathogen-free conditions in the Transgenic Mouse Core Facility of Cornell University, accredited by AAALAC (The Association for Assessment and Accreditation of Laboratory Animal Care International).” [194]
Item 16. Animal care and monitoring
16a. Describe any interventions or steps taken in the experimental protocols to reduce pain, suffering and distress.
Explanation
A safe and effective analgesic plan is critical to relieve pain, suffering and distress. Untreated pain can affect the animals’ biology and add variability to the experiment; however specific pain management procedures can also introduce variability, affecting experimental data [195, 196]. Under-reporting of welfare management procedures can also contribute to the perpetuation of non-compliant methodologies and insufficient or inappropriate use of analgesia [196] or other welfare measures. A thorough description of the procedures used to alleviate pain, suffering and distress provides practical information for researchers to replicate the method.
Clearly describe pain management strategies, including:
specific analgesic
administration method (formulation, route, dose, concentration, volume, frequency, timing, and equipment used)
rationale for the choice (e.g. animal model, disease/pathology, procedure, mechanism of action, pharmacokinetics, personnel safety)
protocol modifications to reduce pain, suffering and distress (e.g. changes to the anaesthetic protocol, increased frequency of monitoring, procedural modifications, habituation, etc.)
If analgesics or other welfare measures, reasonably expected for the procedure performed, are not performed for experimental reasons, report the scientific justification [197].
Examples
“If piglets developed diarrhea, they were placed on an electrolyte solution and provided supplemental water, and if the diarrhea did not resolve within 48 h, piglets received a single dose of ceftiofur (5.0 mg ceftiofur equivalent/kg of body weight i.m [Excede, Zoetis, Florham Park, NJ]). If fluid loss continued after treatment, piglets then received a single dose of sulfamethoxazole and trimethoprim oral suspension (50 mg/8 mg per mL, Hi-Tech Pharmacal, Amityville, NY) for 3 consecutive days.” [198]
“One hour before surgery, we administered analgesia to the mice by offering them nut paste (Nutella; Ferrero, Pino Torinese, Italy) containing 1 mg per kg body weight buprenorphine (Temgesic; Schering-Plough Europe, Brussels, Belgium) for voluntary ingestion, as described previously. The mice had been habituated to pure nut paste for 2 d prior to surgery.” [199]
“If a GCPS score equal or greater than 6 (out of 24) was assigned postoperatively, additional analgesia was provided with methadone 0.1 mg kg-1 IM (or IV if required), and pain reassessed 30 minutes later. The number of methadone doses was recorded.” [44]
16b. Report any expected or unexpected adverse events.
Explanation
Reporting adverse events allows other researchers to minimise the risk of these events occurring in their own studies and to plan appropriate welfare assessments. If the experiment is testing the efficacy of a treatment, the occurrence of adverse events may alter the balance between treatment benefit and risk [33].
Report any adverse events, expected or unexpected, that had a negative impact on the welfare of the animals in the study (e.g. cardiovascular and respiratory depression, CNS disturbance, hypothermia, reduction of food intake).
Examples
“Murine lymph node tumors arose in 11 of 12 mice that received N2-transduced human cells. The neo gene could be detected in murine cells as well as in human cells. Significant lymphoproliferation could be seen only in the murine pre-T cells. It took 5 months for murine leukemia to arise; the affected mice displayed symptoms of extreme sickness rapidly, with 5 of the 12 mice becoming moribund on exactly the same day (Figure 3), and 6 others becoming moribund within a 1-month period…Of the 12 mice that had received N2-transduced human cells, 11 had to be killed because they developed visibly enlarged lymph nodes and spleen, hunching, and decrease in body weight, as shown in Figure 3…. The 12th mouse was observed carefully for 14 months; it did not show any signs of leukemia or other adverse events, and had no abnormal tissues when it was autopsied.…The mice were observed at least once daily for signs of illness, which were defined as any one or more of the following: weight loss, hunching, lethargy, rapid breathing, skin discoloration or irregularities, bloating, hemi-paresis, visibly enlarged lymph nodes, and visible solid tumors under the skin. Any signs of illness were logged as “adverse events” in the experiment, the mouse was immediately killed, and an autopsy was performed to establish the cause of illness.” [200]
“Although procedures were based on those reported in the literature, dogs under Protocol 1 displayed high levels of stress and many experienced vomiting. This led us to significantly alter procedures in order to optimize the protocol for the purposes of our own fasting and postprandial metabolic studies.” [201]
16c. Describe the humane endpoints established for the study and the frequency of monitoring.
Explanation
Humane endpoints are predetermined morphological, physiological and/or behavioural signs that define the circumstances under which an animal will be removed from an experimental study. The use of humane endpoints can help minimise harm while allowing the scientific objectives to be achieved [202]. Report the humane endpoints that were established for the specific study, species and strain. Include clear criteria of the clinical signs monitored [127], and clinical signs that led to euthanasia or other defined actions. Include details such as general welfare indicators (e.g. weight loss, reduced food intake, abnormal posture) and procedure-specific welfare indicators (e.g. tumour size in cancer studies [47], sensory motor deficits in stroke studies [203]).
Report the timing and frequency of monitoring, taking into consideration the normal circadian rhythm of the animal and timing of scientific procedures, as well as any increase in the frequency of monitoring (e.g. post-surgery recovery, critical times during disease studies, or following the observation of an adverse event). Publishing score sheets of the clinical signs that were monitored [204] can help guide other researchers to develop clinically relevant welfare assessments, particularly for studies reporting novel procedures.
Example
“Both the research team and the veterinary staff monitored animals twice daily. Health was monitored by weight (twice weekly), food and water intake, and general assessment of animal activity, panting, and fur condition…. The maximum size the tumors allowed to grow in the mice before euthanasia was 2000 mm3.” [205]
Item 17. Interpretation/scientific implications
17a. Interpret the results, taking into account the study objectives and hypotheses, current theory and other relevant studies in the literature.
Explanation
It is important to interpret the results of the study in the context of the study objectives (see item 13 – Objectives). For hypothesis-testing studies, interpretations should be restricted to the primary outcome (see item 6 – Outcome measures). Exploratory results derived from additional outcomes should not be described as conclusive, as they may be underpowered and less reliable.
Discuss the findings in the context of current theory, ideally with reference to a relevant systematic review, as individual studies do not provide a complete picture. If a systematic review is not available, take care to avoid selectively citing studies that corroborate the results, or only those that report statistically significant findings [206].
Where appropriate, describe any implications of the experimental methods or research findings for improving welfare standards or reducing the number of animals used in future studies (e.g. the use of a novel approach reduced the results’ variability, thus enabling the use of smaller group sizes without losing statistical power). This may not be the primary focus of the research but reporting this information enables wider dissemination and uptake of refined techniques within the scientific community.
Examples
“This is in contrast to data provided by an ‘intra-renal IL-18 overexpression’ model [43], and may reflect an IL-18 concentration exceeding the physiologic range in the latter study.” [207]
“The new apparatus shows potential for considerably reducing the number of animals used in memory tasks designed to detect potential amnesic properties of new drugs… approximately 43,000 animals have been used in these tasks in the past 5 years but with the application of the continual trials apparatus we estimate that this could have been reduced to 26,000…. with the new paradigm the number of animals needed to obtain reliable results and maintain the statistical power of the tasks is greatly reduced.” [208]
“In summary, our results show that IL-1Ra protects against brain injury and reduces neuroinflammation when administered peripherally to aged and comorbid animals at reperfusion or 3 hours later. These findings address concerns raised in a recent systematic review on IL-1Ra in stroke [209], and provide further supporting evidence for IL-1Ra as a lead candidate for the treatment of ischemic stroke.” [210]
17b. Comment on the study limitations including potential sources of bias, limitations of the animal model, and imprecision associated with the results.
Explanation
Discussing the limitations of the work is important to place the findings in context, interpret the validity of the results, and ascribe a credibility level to its conclusions [211]. Limitations are unavoidable in scientific research, and describing them is essential to share experience, guide best practice, and aid the design of future experiments.
Discuss the quality of evidence presented in the study, and consider how appropriate the animal model is to the specific research question. A discussion on the rigour of the study design to isolate cause and effect (also known as internal validity [212]) should include whether potential risks of bias have been addressed [9].
Examples
“Although in this study we did not sample the source herds, the likelihood of these herds to be Influenza A virus (IAV) positive is high given the commonality of IAV infections in the Midwest. However, we cannot fully rule out the possibility that new gilts became infected with resident viruses after arrival to the herd. Although new gilts were placed into isolated designated areas and procedures were in place to minimize disease transmission (eg. isolation, vaccination), these areas or procedures might not have been able to fully contain infections within the designated areas.” [213]
“Even though our data demonstrates that sustained systemic TLR9 stimulation aggravates diastolic HF in our model of gene-targeted diastolic HF, there are several limitations as to mechanistic explanations of causality, as well as extrapolations to clinical inflammatory disease states and other HF conditions. First, our pharmacological inflammatory model does not allow discrimination between effects caused by direct cardiac TLR9 stimulation to that of indirect effects mediated by systemic inflammation. Second, although several systemic inflammatory conditions have disturbances in the innate immune system as important features, and some of these again specifically encompassing distorted TLR9 signalling [34], sustained TLR9 stimulation does not necessarily represent a clinically relevant inflammatory condition. Finally, the cardiac myocyte SERCA2a KO model does not adequately represent the molecular basis for, or the clinical features of, diastolic HF.” [214]
Item 18. Generalisability/translation
Comment on whether, and how, the findings of this study are likely to generalise to other species or experimental conditions, including any relevance to human biology (where appropriate).
Explanation
An important purpose of publishing research findings is to inform future research. In the context of animal studies, this might take the form of further in vivo research or another research domain (e.g. human clinical trial). Thoughtful consideration is warranted, as additional unnecessary animal studies are wasteful and unethical. Similarly, human clinical trials initiated based on insufficient or misleading animal research evidence increase research waste and negatively influence the risk-benefit balance for research participants [212, 215].
Consider whether the findings may be used to inform future research in a broadly similar context, or whether enough evidence has been accumulated in the literature to justify further research in another species or in humans. Discuss what (if any) further research may be required to allow generalisation or translation. Discuss and interpret the results in relation to current evidence, and in particular whether similar [216] or otherwise supportive [217] findings have been reported by other groups. Discuss the range of circumstances in which the effect is observed, and factors which may moderate that effect. Such factors could include for example the population (e.g. age, sex, strain, species), the intervention (e.g. different drugs of the same class), and the outcome measured (e.g. different approaches to assessing memory).
Examples
“Our results demonstrate that hDBS robustly modulates the mesolimbic network. This finding may hold clinical relevance for hippocampal DBS therapy in epilepsy cases, as connectivity in this network has previously been shown to be suppressed in mTLE. Further research is necessary to investigate potential DBS-induced restoration of MTLE-induced loss of functional connectivity in mesolimbic brain structures.” [218]
“The tumor suppressor effects of GAS1 had been previously reported in cell cultures or in xenograft models, this is the first work in which the suppressor activity of murine Gas1 is reported for primary tumors in vivo. Recent advances in the design of safe vectors for transgene delivery may result in extrapolating our results to humans and so a promising field of research emerges in the area of hepatic, neoplastic diseases.” [219]
Item 19. Protocol registration
Provide a statement indicating whether a protocol (including the research question, key design features, and analysis plan) was prepared before the study, and if and where this protocol was registered.
Explanation
Akin to the approach taken for clinical trials, protocol registration has emerged as a mechanism that is likely to improve the transparency of animal research [215, 220, 221]. Registering a protocol before the start of the experiment enables researchers to demonstrate that the hypothesis, approach and analysis were planned in advance and not shaped by data as they emerged; it enhances scientific rigour and protects the researcher against concerns about selective reporting of results [222, 223]. A protocol should consist of a) the question being addressed and the key features of the research that is proposed, such as the hypothesis being tested and the primary outcome measure (if applicable), the statistical analysis plan; and b) the laboratory procedures to be used to perform the planned experiment.
Protocols may be registered with different levels of completeness. For example, in the Registered Report format offered by an increasing number of journals, protocols undergo peer review and if accepted, the journal commits to publishing the completed research regardless of the results obtained [220].
Other online resources include the Open Science Framework [224], which is suitable to deposit PHISPS (Population; Hypothesis; Intervention; Statistical Analysis Plan; Primary; Outcome Measure; Sample Size Calculation) protocols [225] and provide researchers with the flexibility to embargo the preregistration and keep it from public view until the research is published, and selectively share it with reviewers and editors. The EDA can also be used to generate a time-stamped PDF, which sets out key elements of the experimental design [18]. This can be used to demonstrate that the study conduct, analysis and reporting were not unduly driven by emerging data. As a minimum we recommend registering protocols containing all PHISPS components as outlined above.
Provide a statement indicating whether or not any protocol was prepared before the study, and if applicable, the location of its registration. Where there have been deviations from the protocol, describe the rationale for these changes in the publication so that readers can take this into account when assessing the findings.
Examples
“A detailed description of all protocols can be found in the Registered Report (Kandela et al., 2015). Additional detailed experimental notes, data, and analysis are available on the Open Science Framework (OSF) (RRID: SCR_003238) (https://osf.io/xu1g2/; Mantis et al., 2016).”[226]
“To maximise the objectivity of the presented analyses, we preregistered this study with its two hypotheses, its planned methods, and its complete plan of data analysis before the start of data collection (https://osf.io/fh8eq/), and we closely adhered to our plan… All statistical analyses closely followed our preregistered analysis plan (https://osf.io/fh8eq/).” [227]
“We preregistered our analyses with the Open Science Framework which facilitates reproducibility and open collaboration in science research (Foster & Deardorff, 2017). Our preregistration: Sheldon and Griffith (2017), was carried out to limit the number of analyses conducted and to validate our commitment to testing a limited number of a priori hypotheses. Our methods are consistent with this preregistration (Sheldon & Griffith, 2017).” [228]
Item 20. Data access
Provide a statement describing if and where study data are available.
Explanation
A data sharing statement describes how others can access the data on which the paper is based. Sharing adequately annotated data allows others to replicate data analyses, so that results can be independently tested and verified. Data sharing allows the data to be repurposed and new datasets to be created by combining data from multiple studies (e.g. to be used in secondary analyses). This allows others to explore new topics and increases the impact of the study, potentially preventing unnecessary use of animals and providing more value for money. Access to raw data also facilitates text and automated data mining [229].
An increasing number of publishers and funding bodies require authors or grant holders to make their data publicly available [230]. Journal articles with accompanying data may be cited more frequently [231]. Datasets can also be independently cited in their own right, which provides additional credit for authors. This practice is gaining increasing recognition and acceptance [232].
Where possible, make available all data that contribute to summary estimates or claims presented in the paper. Data should follow the FAIR guiding principles [233], that is data are findable, accessible (i.e. don’t use outdated file types), interoperable (can be used on multiple platforms and with multiple software packages) and re-usable (i.e. have adequate data descriptors).
Data can be made publicly available via a structured, specialised (domain-specific), open access repository such as those maintained by NCBI (https://www.ncbi.nlm.nih.gov/) or EBI (https://www.ebi.ac.uk/). If such a repository is not available, data can be deposited in unstructured but publicly available repositories (e.g. Figshare (https://figshare.com/), Dryad (https://datadryad.org/), Zenodo (https://zenodo.org/) or Open Science Framework (https://osf.io/)). There are also search platforms to identify relevant repositories with rigorous standards (e.g. FairSharing (https://fairsharing.org/) and re3data (https://www.re3data.org/).
Examples
“Data Availability Statement: All data are available from Figshare at http://dx.doi.org/10.6084/m9.figshare.1288935.”[234]
“A fundamental goal in generating this dataset is to facilitate access to spiny mouse transcript sequence information for external collaborators and researchers. The sequence reads and metadata are available from the NCBI (PRJNA342864) and assembled transcriptomes (Trinity_v2.3.2 and tr2aacds_v2) are available from the Zenodo repository (https://doi.org/10.5281/zenodo.808870), however accessing and utilizing this data can be challenging for researchers lacking bioinformatics expertise. To address this problem we are hosting a SequenceServer32 BLAST-search website (http://spinymouse.erc.monash.edu/sequenceserver/http://spinymouse.erc.monash.edu/sequenceserver/). This resource provides a user-friendly interface to access sequence information from the tr2aacds_v2 assembly (to explore annotated protein-coding transcripts) and/or the Trinity_v2.3.2 assembly (to explore non-coding transcripts).” [235]
Item 21. Declaration of interests
21a. Declare any potential conflicts of interest, including financial and non-financial. If none exist, this should be stated.
Explanation
A competing or conflict of interest is anything that interferes with (or could be perceived as interfering with) the full and objective presentation, analysis, and interpretation of the research. Competing or conflicts of interest can be financial or non-financial, professional or personal. They can exist in institutions, in teams, or with individuals. Potential competing interests are considered in peer review, editorial and publication decisions; the aim is to ensure transparency, and in most cases, a declaration of a conflict of interest does not obstruct the publication or review process.
Examples are provided in Box 8. If unsure, declare all potential conflicts, including both perceived and real conflicts of interest [236].
Examples of competing or conflicts of interest
Financial:
Funding and other payments received or expected by the authors directly arising from the publication of the study, or funding or other payments from an organisation with an interest in the outcome of the work.
Non-financial:
Research that may benefit the individual or institution in terms of goods in kind. This includes unpaid advisory position in a government, non-government organisation or commercial organisations.
Affiliations:
Employed by, on the advisory board or a member of an organisation with an interest in the outcome of the work.
Intellectual property:
Patents or trademarks owned by someone or their organisation. This also includes the potential exploitation of the scientific advance being reported for the institution, the authors, or the research funders.
Personal:
Friends, family, relationships, and other close personal connections to people who may potentially benefit financially or in other ways from the research.
Ideology:
Beliefs or activism (e.g. political or religious) relevant to the work. Membership of a relevant advocacy or lobbying organisation.
Examples
“The study was funded by Gubra ApS. LSD; PJP; GH; KF and HBH are employed by Gubra ApS. JJ and NV are the owners of Gubra ApS. Gubra ApS provided support in the form of materials and salaries for authors LSD; PJP; GH; KF; HBH; JJ and NV.” [237]
“The authors have declared that no competing interests exist.” [238].
21b. List all funding sources (including grant identifier) and the role of the funder(s) in the design, analysis and reporting of the study.
Explanation
The identification of funding sources allows the reader to assess any competing interests, and any potential sources of bias. For example, bias, as indicated by a prevalence of more favourable outcomes, has been demonstrated for clinical research funded by industry compared to studies funded by other sources [239–241]. Evidence for preclinical research also indicates that funding sources may influence the interpretation of study outcomes [236, 242].
Report the funding information including the financial supporting body(s) and any grant identifier(s). Include the role of the funder in the design, analysis, reporting and/or or decision to publish. If the research did not receive specific funding, but was performed as part of the employment of the authors, name the employer.
Examples
“Support was provided by the Italian Ministry of Health: Current research funds PRC 2010/001 [http://www.salute.gov.it/] to MG. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.” [243]
“This study was financially supported by the Tuberculosis and Lung Research Center of Tabriz University of Medical Sciences and the Research Council of University of Tabriz. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.” [244]
“This work was supported by the salary paid to AEW. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.” [245]
Competing interests
AA: editor in chief of the British Journal of Pharmacology. WJB, ICC and ME: authors of the original ARRIVE guidelines. WJB: serves on the Independent Statistical Standing Committee of the funder CHDI foundation. AC, CJM, MMcL and ESS: involved in the IICARus trial. ME, MMcL and ESS: have received funding from NC3Rs. ME: sits on the MRC ERPIC panel. STH: chair of the NC3Rs board, trusteeship of the BLF, Kennedy Trust, DSRU and CRUK, member of Governing Board, Nuffield Council of Bioethics, member Science Panel for Health (EU H2020), founder and NEB Director Synairgen, consultant Novartis, Teva and AZ, chair MRC/GSK EMINENT Collaboration. VH, KL, EJP and NPdS: NC3Rs staff, role includes promoting the ARRIVE guidelines. CJMcC: shareholdings in Hindawi, on the publishing board of the Royal Society, on the EU Open Science policy platform. MMcL, NPdS, CJMcC, ESS, TS and HW: members of EQIPD. MMcL: member of the Animals in Science Committee. NPdS and TS: associate editors of BMJ Open Science. OP: vice president of Academia Europaea, senior executive editor of the Journal of Physiology, member of the Board of the European Commission’s SAPEA (Science Advice for Policy by European Academies). FR: NC3Rs board member, shareholdings in AstraZeneca and GSK. PR: member of the University of Florida Institutional Animal Care and Use Committee, Editorial board member of Shock. ESS: editor in chief of BMJ Open Science. SDS: role is to provide expertise and does not represent the opinion of the NIH. TS: shareholdings in Johnson & Johnson. SA, MTA, MB, UD, PG, DWH, NAK, and KR declared no conflict of interest.
Acknowledgements
We would like to acknowledge the late Doug Altman’s contribution to this project, Doug was a dedicated member of the working group and his input into the guidelines’ revision has been invaluable.
References
- 1.↵
- 2.
- 3.
- 4.↵
- 5.↵
- 6.↵
- 7.
- 8.↵
- 9.↵
- 10.
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.
- 99.
- 100.
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.
- 108.↵
- 109.↵
- 110.↵
- 111.
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.
- 120.↵
- 121.↵
- 122.↵
- 123.↵
- 124.↵
- 125.↵
- 126.↵
- 127.↵
- 128.↵
- 129.↵
- 130.↵
- 131.↵
- 132.↵
- 133.↵
- 134.↵
- 135.↵
- 136.↵
- 137.↵
- 138.↵
- 139.↵
- 140.↵
- 141.↵
- 142.↵
- 143.↵
- 144.↵
- 145.↵
- 146.↵
- 147.↵
- 148.↵
- 149.↵
- 150.↵
- 151.↵
- 152.↵
- 153.↵
- 154.↵
- 155.↵
- 156.↵
- 157.↵
- 158.↵
- 159.↵
- 160.↵
- 161.↵
- 162.↵
- 163.↵
- 164.
- 165.
- 166.
- 167.
- 168.
- 169.
- 170.
- 171.
- 172.
- 173.
- 174.
- 175.
- 176.
- 177.
- 178.
- 179.
- 180.
- 181.
- 182.
- 183.
- 184.
- 185.
- 186.
- 187.
- 188.
- 189.↵
- 190.↵
- 191.↵
- 192.↵
- 193.↵
- 194.↵
- 195.↵
- 196.↵
- 197.↵
- 198.↵
- 199.↵
- 200.↵
- 201.↵
- 202.↵
- 203.↵
- 204.↵
- 205.↵
- 206.↵
- 207.↵
- 208.↵
- 209.↵
- 210.↵
- 211.↵
- 212.↵
- 213.↵
- 214.↵
- 215.↵
- 216.↵
- 217.↵
- 218.↵
- 219.↵
- 220.↵
- 221.↵
- 222.↵
- 223.↵
- 224.↵
- 225.↵
- 226.↵
- 227.↵
- 228.↵
- 229.↵
- 230.↵
- 231.↵
- 232.↵
- 233.↵
- 234.↵
- 235.↵
- 236.↵
- 237.↵
- 238.↵
- 239.↵
- 240.
- 241.↵
- 242.↵
- 243.↵
- 244.↵
- 245.↵