Abstract
Objective To determine whether a change in editorial policy, including the implementation of a checklist, has been associated with improved reporting of measures which might reduce the risk of bias.
Methods The study protocol has been published at DOI: 10.1007/s11192-016-1964-8.
Design Observational cohort study.
Population Articles describing research in the life sciences published in Nature journals, submitted after May 1st 2013.
Intervention Mandatory completion of a checklist at the point of manuscript revision.
Comparators (1) Articles describing research in the life sciences published in Nature journals, submitted before May 2013; (2) Similar articles in other journals matched for date and topic.
Primary Outcome Change in proportion of Nature publications describing in vivo research published before and after May 2013 reporting the Landis 4 items (randomisation, blinding, sample size calculation, exclusions).
We included 448 NPG papers (223 published before May 2013, 225 after) identified by an individual hired by NPG for this specific task, working to a standard procedure; and an independent investigator used Pubmed Related Citations to identify 448 non-NPG papers with a similar topic and date of publication in other journals; and then redacted all publications for time sensitive information and journal name. Redacted manuscripts were assessed by 2 trained reviewers against a 74 item checklist, with discrepancies resolved by a third.
Results 394 NPG and 353 matching non-NPG publications described in vivo research. The number of NPG publications meeting all relevant Landis 4 criteria increased from 0/203 prior to May 2013 to 31/181 (16.4%) after (2-sample test for equality of proportions without continuity correction, X2 = 36.2, df = 1, p = 1.8 x 10-9). There was no change in the proportion of non‐ NPG publications meeting all relevant Landis 4 criteria (1/164 before, 1/189 after). There were more substantial improvements in the individual prevalences of reporting of randomisation, blinding, exclusions and sample size calculations for in vivo experiments, and less substantial improvements for in vitro experiments.
Conclusions There was a substantial improvement in the reporting of risks of bias in in vivo research in NPG journals following a change in editorial policy, to a level that to our knowledge has not been previously observed. However, there remain opportunities for further improvement.
Background
Few publications describing in vivo research report taking specific actions designed to reduce the risk that their findings are confounded by bias, and those that do not report such actions give inflated estimates of biological effects. Strategies and guidelines which might improve the quality of reports of in vivo research have been proposed, [1,2] and while these have been endorsed by a large number of journals there is evidence that this endorsement has not been matched by a substantial increase in the quality of published reports [3].
Poor replication of in vitro molecular and cellular biology studies has also been reported [4,5] and this has been attributed in part to poor descriptions of the experimental and analytical details.
In May 2013 Nature Journals announced a change in editorial policy which required authors of submissions in the life sciences to complete a checklist, at the time of manuscript acceptance, indicating whether or not they had taken certain measures which might reduce the risk of bias and to report key experimental and analytical details; and in their submission to detail where in the manuscript these issues were addressed [6].
The aim of this study was to determine whether the implementation of this checklist for submissions has been associated with improved reporting of measures that might reduce the risk of bias. To establish whether any observed change in quality was a simply a secular trend occurring across all journals we matched each included publication with a publication in a similar subject area published at around the same time by a different publisher.
Methods
The methods are described in detail in the published study protocol [7], and the data analysis plan and analysis code were articulated prior to database lock and registered on the Open Science Framework (https://osf.io/mqet6/#). The complete study dataset including PMIDs (but not, for copyright reasons, the source pdfs) of included articles is available on Figshare (10.6084/m9.figshare.5375275).
In this observational cohort study we aimed to determine whether the implementation of a checklist for submissions has been associated with improved reporting of measures which might reduce the risk of bias. The study populations comprised (1) Published articles accepted for publication in Nature journals, which described research in the life sciences and which were submitted after May 1st 2013, at which time the mandatory completion of a checklist at the stage of manuscript revision, was introduced. This checklist required authors to indicate where details relating to study design could be found in the manuscript at the point of manuscript revision. and before November 1st 2014; (2) Published articles accepted for publication in Nature journals in the months preceding May 2013, which describe research in the life sciences; and (3) manuscripts from other journals matched for subject area and time of publication. We measured the change in the reporting of items included in the checklist.
Identification of relevant manuscripts
NPG publications
One individual was specifically employed by Nature to select studies which (a) described in vivo or in vitro research; (b) was published in Nature, Nature Neurology, Nature Immunology, Nature Cell Biology, Nature Chemical Biology, Nature Biotechnology, Nature Methods, Nature Medicine or Nature Structural and Molecular Biology. First, they identified papers accepted for publication with an initial submission date later than May 1st, 2013. Beginning with the then-current issues (volume corresponding to year 2015), they worked backwards in time, ensuring the submission date was after 1st May 2013, collecting papers until they had 40 Nature papers and 20 each from other titles (“Post intervention” group). They then used a similar process to identify papers submitted for publication before 1st May 2013, matched for journal and for country of origin, starting with the May 2013 issue and working backwards, ensuring that the date of submission was after 1st May 2011 (“pre-intervention” group). Where no match could be found with a submission date after 1st May 2011 (i.e. in a two year period) then the non-matched post intervention publication was excluded from analysis and a replacement post intervention publication selected, as above, with a matching pre-intervention publication then identified, as described above. Publications describing research involving only human subjects were not to be included. A Nature editorial administrator independent of publishing decisions reviewed manuscripts selection against the inclusion criteria and found some (less than10%) had been included incorrectly; they replaced these with manuscript pairs that they selected according to the inclusion algorithm. The published files corresponding to the publication pdfs (including the extended methods section, extended data and other supplementary materials) were used to generate pdfs for analysis. These were provided to a member of our research team (RM) at a different institution who used Adobe Acrobat to redact information relating to author names or affiliations, dates, volumes or page numbers; and the reference list; to minimise awareness of outcome assessors to whether the manuscript was pre‐ or post‐ intervention.
Non‐ NPG publications
The same individual was responsible for identifying matching publications in other journals. They identified the NPG publication in PubMed and searched for “related citations” with the same calendar month of publication, selecting the first that was not published in an NPG Journal that also matched for whether it reported in vivo research, in vitro research, or both. If no matching related citation was found the extended the window of publication by 2 months, continuing until a matching publication was found. Because of a limited number of potential matching publications it was not possible to match non NPG manuscripts by country. The individual making this selection paid no further part in the study.
Outcome assessment
The Nature checklist focussed on transparency in reporting and availability of materials and code, reflected in 10 items. We designed a series of questions (Appendix 1) to establish whether a given publication met or did not meet the requirements of the checklist. Where a manuscript described both in vivo and in vitro research, the series of questions was completed for each. Where there is more than one in vitro experiment or more than one in vivo experiment the question was considered in aggregate; that is, all experiments had to meet the requirements of the checklist item for it to be considered compliant.
Five researchers experienced in systematic review and risk of bias annotation scored the same 10 publications using our series of questions. Disagreements were resolved by group discussion, to arrive at a set of “Gold standard” answers for these 10 publications. We also used this experience to write a training guide for those seeking to use the checklist. We then used social media platforms and mailing lists to recruit outcome assessors. We had no prior requirements for the skills required of these individuals, but most had a background in medicine or biomedicine at graduate or undergraduate level; two were senior school students on Nuffield Research Placements in our group. After reviewing the training materials outcome assessors were invited sequentially to score publications from the “Gold standard” pool until their concordance with the Gold standard responses was 80% overall, and was 100% for the components of the primary outcome measure, for three successive publications. At this point we considered them to be trained. The training platform remains available for continuing professional development, at https://ecrf1.clinicaltrials.ed.ac.uk/npqip/Review/TrainingCover.
Pdf files of included manuscripts were uploaded to a bespoke website. Trained assessors were presented with manuscripts for scoring in random order. Each manuscript was scored by 2 individuals, one with experience in systematic review and risks of bias annotation and one recruited from outside this community. Disagreement between assessors were reconciled by a third, experienced individual who was not one of the original reviewers, who could see the responses previously given but not who the initial reviewers were.
Statistical analysis plan
Given our focus on the reporting of measures to reduce the risks of bias we took as our primary outcome measure a composite measure of the proportion of publications meeting the relevant measures identified by Landis et al as being most important for transparency in reporting in vivo research. These are covered by items 2, 3 4 and 5 of the checklist and relate to the reporting of randomisation; of the blinded assessment of outcome; of sample size calculations; and of whether the manuscript described whether samples or animals were excluded from analysis. Importantly, checklist compliance did not require for example that the study was randomised; but rather that the authors stated whether or not it was randomised. The evaluation principle was to determine if someone with reasonable domain-knowledge could understand the parameters of experimental design sufficiently to inform interpretation. It has been argued that these measures might not be as relevant for exploratory studies, and for these we recorded the item as “not relevant”. We defined exploratory studies as those where hypothesis testing inferential statistical analyses were not reported. Where an item was not relevant for a publication (for instance with studies using transgenic animals where group allocation had been achieved by Mendelian randomisation) we considered compliance as meeting the remaining relevant criteria. Where a publication described both in vivo and in vitro experiments we analysed each type of experiment separately.
Our primary outcome was the change in the proportion of publications describing in vivo experiments published by NPG before and after May 2013 that meet all of the relevant Landis 4 criteria. We used the two-sample proportion test (prop.test) in R without the Yates continuity correction and two sided hypothesis testing to be sensitive to the possibility that performance might have declined rather than improved. Secondary outcomes were whether the proportion of publications describing in vivo experiments published by NPG after May 2013 which meet all four of the Landis 4 criteria was 80% or higher (Wald test; wald.ptheor.test, RVAideMemoire in R); the change in the proportion of publications describing in vitro experiments published by NPG before and after May 2013 which meet all four of the Landis 4 criteria (two sample proportion test as above); and the change of proportions in adequate reporting of statistical analysis details, individual Landis criteria, and descriptions of animals; reagents and their availability; sequence, structure or computer code deposition; and items relating to the involvement of human subjects or materials in included studies.. For the matching publications from non-NPG journals the secondary outcomes were the change in the proportion of publications describing in vivo experiments published before and after May 2013 which met all of the Landis 4 criteria (two sample proportion test); whether the proportion of publications describing in vivo experiments published after May 2013 which met all four of the Landis 4 criteria was 80% or higher (Wald test); the change in the proportion of publications describing in vitro experiments published before and after May 2013 which meet all four of the Landis 4 criteria 4 (two sample proportion test); and the change of proportions in adequate reporting of statistical analysis details, individual Landis criteria, and descriptions of animals; reagents and their availability; sequence, structure or computer code deposition; and items relating to the involvement of human subjects or materials in included studies. For each of these outcomes we compared the changes observed in NPG publications with that observed in non NPG publications. For each secondary analysis we used Holm Boneferroni correction using the p.adjust option for prop.test in R to account for the number of comparisons drawn, as described in Appendix B of the Data Analysis Plan. We also used interrupted time series analysis for each checklist item in an attempt to distinguish a discrete “shift” in performance from an upward “drift”, as described in the data analysis plan. A number of tertiary outcomes are described in the study protocol and statistical analysis plan and are reported in the supplementary material.
Power Calculations
In planning the study we performed power calculations in STATA. The power to detect changes in reporting depended on the baseline performance; with baseline prevalence of compliance of 10% we had 80% power to detect an absolute increase of 13% to 23% at a significance level of p<0.01; with baseline compliance of 50% we had 80% power to detect an absolute increase of 16% to 66% at a significance level of p<0.01. For secondary outcomes we had lower statistical power, but after correction for the number of comparisons made we had at worse 67% power to detect a 15% improvement in the reporting of any individual item.
Results
896 publications were identified and uploaded for outcome ascertainment, 448 in each cohort. 2 non-NPG manuscripts were excluded because they did not meet the inclusion criteria, and we identified 4 NPG and 9 non-NPG manuscripts included more than once. 444 NPG publications and 437 non-NPG publications underwent outcome assessment. One NPG publication and one non-NPG publication were adjudged at the time of outcome assessment to report neither in vivo nor in vitro research and so were excluded. The analysis is therefore based on 443 NPG publications (219 before and 224 after 1st May 2013) and 436 non-NPG publications (194 before and 242 after 1st May 2013) (Figure1). The difference in numbers for NPG and non-NPG before and after 1st May 2013 is because some of the NPG “before” papers matched best with publications in other journals published in the few months following May 2013. Overall, 43% of matched pairs had dates of publication within 1 month, 54% within 2 months, 64% within 3 months and 81% within 6 months of each other (range −11 to +22 months). 239 publications described only in vivo research, 132 described only in vitro research, and 508 described both. The source journals are given in Table 1; in total 198 different titles contributed matching publications (median manuscripts per publication 1, range 1 - 47). The PMIDs of included publications are listed in the data supplement.
205 individuals registered with the project, of whom 38 completed their training and 35 assessed at least one manuscript. 12 also served as reconcilers, and the web interface was programmed to ensure that they were not offered for reconciliation manuscripts that they had previously adjudicated. Including reconciliation, the median number of manuscripts scored was 13 (range 1 to 441). The agreement between the initial pair of outcome assessors ranged from being no better than chance at 50% (in vivo studies, Implementation of statistical methods and measures: “Is the variance similar (difference less than two-fold) between the groups that are being statistically compared?”) to 98% (in vivo studies, “Does the study report the species?”). Median agreement was 82%. (IQR 68 - 89%).
Reporting of the Landis 4 items
The proportion of NPG in vivo studies reaching full compliance with the Landis 4 criteria increased from 0% (0/204) to 16.3% (31/190) (X2 = 36.1, df = 1, p = 1.8x10-9), but remained significantly lower than the target of 80% (95% CI 11.7% to 22.3%, Wald test v 80% t = -15.4, p = 2.2x10-16).
For randomisation to experimental group, the preferred standard is that the manuscript describes which method of randomization was used to determine how samples or animals were allocated to experimental groups, although manuscripts were also compliant if they included a statement about randomization even if no randomization was used. The proportion of NPG in vivo studies reporting randomisation was 1.8% (3/170, 95% CI 0.6 to 5.3%) before and 11.2% (19/170, 95% CI 7.2 to 16.9%) after (x2 = 12.4, df = 1, adj p = 0.054). The proportion of studies mentioning randomisation even where it was not reported increased from 8.3% (14/169, 95% CI 5.0 to 13.5) to 64.2% (97/151, 95% CI 56.3 to 71.5%)(x2 = 110.2, df = 1, adj p = 3.2x10-14). Figure 2(a) shows change in the proportion of studies meeting these criteria before and after the change in editorial policy.
For blinding, the preferred standard is that the manuscript describes whether the investigator was blinded to the group allocation during the experiment and/or when assessing the outcome, although manuscripts were also compliant if they included a statement about blinding even if no blinding was done. The proportion of NPG in vivo studies reporting blinding during group allocation or outcome assessment or both increased from 4% (8/198, 95% CI 2.0 to 7.9%) to 22.8% (42/184, 95% CI 17.3 to 29.4%)(X2 = 29.6, df = 1, adj p = 7.6x10-6). The proportion of studies mentioning blinding even where it was not reported increased from 1.6% (3/182, 95% CI 0.5 to 5.0%) to 55.3% (73/132, 95% CI 46.8 to 65.6%)(X2 = 120.1, df = 1, adj p < 3.2x10-14). Figure 1(b) shows change in the proportion of studies meeting these criteria before and after the change in editorial policy.
The proportion of studies reporting animals excluded from analysis increased from 13.9% (28/202, 95% CI 9.7 to 19.3%) to 30.7% (58/189, 95% CI 24.5 to 36.7%)(X2; = 16.1, df = 1, adj p = 0.008). Figure 1(c) shows change in the proportion of studies meeting these criteria before and after the change in editorial policy.
For sample size calculations, the preferred standard is that the manuscript describes how the sample size was chosen to ensure adequate power to detect a pre-specified effect size, although manuscripts were also compliant if they included a statement about sample size estimate even if no statistical methods were used. The proportion of studies reporting an a priori sample size calculation increased from 2.0% (4/196, 95% CI 0.8 to 5.3%) to 14.8% (27/182, 95% CI 10.4 to 20.8%)(X2 = 20.5, df = 1, adj p = 0.0008). The proportion of studies mentioning sample size even where a sample size calculation was not reported increased from 1.6% (3/192, 95% CI 0.5 to 4.7%) to 58.4% (90/154, 95% CI 50.5 to 66.0%)(X2 = 140.7, df = 1, adj p < 3.2x10-14). Figure 1(d) shows change in the proportion of studies meeting these criteria before and after the change in editorial policy.
For NPG in vitro studies, the proportion reaching full compliance with the Landis 4 criteria was 0% (0/159) before and 3.3% (6/176) after (X2 = 6.8, df = 1, Holm Bonferroni adjusted p = 1.00). The proportion of studies reporting randomisation was 0% (0/149) before and 2.9% (5/173, 95% CI 1.2 to 6.8%) after (X2 = 4.4, df = 1, adj p=1.00). The proportion of studies mentioning randomisation even where it was not reported increased from 0% (0/149) to 15.6% (97/151, 95% CI 10.8 to 21.9%)(X2 = 25.3, df = 1, p = 6.9x10-5). The proportion of studies reporting blinding during group allocation or outcome assessment or both was 3.9% (6/155, 95% CI 1.8 to 8.4%) before and 8.9% (16/179, 95% CI 5.6 to 14.1) after (X2; = 3.467, df = 1, p=1.00). The proportion of studies mentioning blinding even where it was not reported increased from 0.7% (1/150, 95% CI 0.1 to 4.6%) to 15.9% (25/157, 95% CI 11.0 to 22.5) (X2; = 23.0, df = 1, p=0.0002). The proportion of studies reporting exclusions from analysis was 8.2% before (13/159, 95% CI 4.8 to 13.6%) and 15.9% (29/182, 95% CI 11.3 to 22.0%) after (X2 = 4.73, df = 1, p =1.00). The proportion of studies reporting an a priori sample size calculation was 1.3% (2/155, 95% CI 0.3 to 5.0%) before and 7.9% (14/177, 95% CI 5.1 to 13.5%) after (X2 = 8.7106, df = 1, p = 1.00). The proportion of studies mentioning sample size even where a sample size calculation was not reported increased from 3.3% (5/153, 95% CI 1.4 to 7.6%) to 28.5% (47/165, 95% CI 22.1 to 35.8%)(X2=36.9, df = 1, p=1.8x10-7).
The proportion of matching (non-NPG) in vivo studies reaching full compliance with the Landis 4 criteria was 1% (1/164) before and 1% (1/189) after (X2 = 0.01, df = 1, adj p = 1.00), and for in vitro studies, the proportion of non-NPG studies reaching full compliance with the Landis 4 criteria was 0% (0/134) before and 1% (1/165) after (X2 = 0.8, df = 1, adj p = 1.00). The prevalence of reporting the different items before and after is shown in table 2; there was no significant change in reporting of any of the individual Landis 4 criteria for either in vivo or in vitro research.
Statistical reporting
For in vivo studies reported in NPG manuscripts there were significant improvements in the reporting of exact numbers (from 46% to 69%), of whether t-tests were defined as one or two sided (from 46% to 71%), and whether the assumptions of the test had been checked (from 9% to 27%). For in vitro experiments described in NPG manuscripts there were significant improvements in the reporting of the exact numbers (from 32% to 70%); of whether data represented technical or biological replicates (from 57% to 75%); and whether t-tests were defined as one or two sided (from 47% to 72%). For in vivo and in vitro studies described in non-NPG publications there was no significant change in any of the items relating to statistical reporting.
Other checklist items
For reporting of details of animals used, reporting of animal species and strain was high even before the change in editorial policy. There was no significant change in reporting any of these items in NPG‐ and non-NPG manuscripts, or in the reporting of details of antibodies used. For in vitro research, there was an increase in the proportion of studies in NPG manuscripts reporting recent mycoplasma testing of the cell lines used (from 1% to 26%) but not for non-NPG manuscripts (1% before, 1% after). For reporting and availability of accession data (eg DNA or protein sequence deposition) and computer code there were no significant changes for either NPG or non-NPG publications. Finally, there were no significant changes in the reporting of items relating to human subjects or the use of human materials, but for most items the number of publications for which these were relevant was very low indeed.
We were also interested in whether changes in reporting had occurred as a step change at the time of the change in editorial policy; whether there was an initial improvement with then a return to previous performance; or if there was an ongoing improvement in reporting. To address these we conducted an interrupted time series analysis, to estimate the rate of change before the intervention; any step change at the time of the intervention; and the rate of change after the intervention. We grouped publications in 3 month periods starting November 2011, and for each quarter calculated the proportional compliance with the criteria in question. Because publications were not evenly distributed across time the analysis is of substantially reduced power, but the fitted lines for overall compliance and for each component of the Landis checklist for in vivo research are shown in Figure 3. It appears that with the exception of sample size calculation there is a continuing improvement over time in both NPG and non NPG publications; for sample size calculations the improvement is only seen in NPG publications. Figure 4 shows radar charts of compliance for each checklist item in NPG ans non NPG manuscripts before and after May 2013.
Discussion
The change in editorial policy at NPG was associated with major improvements in reporting of randomisation, blinding, exclusions from analysis and sample size calculations. For the highly challenging primary outcome measure, full compliance increased from zero to 16%. This falls short of the target compliance of 80%, but should be seen in the context firstly that only 1 of 1073 publications from 2009-10 from leading UK institutions achieved this standard[8]; and secondly that overall compliance of 80% would require compliance with individual items of around 95%.
The checklist relates to transparency in reporting, and manuscripts were judged to be compliant if they either reported measures to address that risk of bias, or reported that such measures were not taken. For reports of in vivo research, compliance for randomisation, blinding, reporting of exclusions and sample size calculations in NPG publications reached 68%, 62%, 31% and 64% respectively. For non NPG publications the performance was 12%, 5%, 12% and 3%. The figures for NPG publications are similar to those recently reported for in vivo research published in the journal “Stroke” [9], which began requiring reporting of such details following the publication of good practice guidelines in 2009 [10]; and where performance was found to be substantially higher than for in vivo research published in other American Heart Association journals.
For reports of in vitro research, compliance was substantially lower. There have been few systematic attempts to measure the quality of reporting of measures to reduce the risks of bias in vitro research, and our findings suggest that, both in NPG and non NPG journals, this remains low. There were improvements in reporting randomisation, blinding and sample size calculations in NPG descriptions of in vitro research, but only to 18%, 23% and 34% respectively. For non NPG the equivalent figures were 3%, 1% and 1%. There were no significant changes in the reporting of exclusion of in vitro data, with post intervention compliance of 16% in NPG publications and 6% in non NPG publications.
For other checklist items, changes in performance were less dramatic, but there appeared to be incremental improvements across most of the items measured, although few of these breached our rather parsimonious adjustment for multiple testing. In spite of substantial attention given to the importance of reporting the sex of experimental animals this was only done in 52% of post intervention NPG studies and in 36% of non NPG studies.
Ours is an observational study, and it is possible that other (related or unrelated) changes were responsible for much if not all of the differences seen. These changes were not observed at other journals (at least not when taken in aggregate), and so it is likely that alternative causal factors would relate to NPG editorial policy and practice. While we are not aware of any other relevant changes in editorial policy occurring at a relevant time, it is likely that this change in editorial policy was accompanied by increased attention given to the importance of the quality of reporting by both in house editorial staff and external peer reviewers. It is not possible to determine whether these might have caused the changes seen. However, a randomised controlled study of the effect of ARRIVE checklist completion on the quality of reporting of in vivo research at PLoS One will report shortly.
During the course of the study we encountered some difficulties that we had not expected. We had thought that it would be straightforward to distinguish between an in vivo experiment and an in vitro experiment, but we had to develop an operational approach which defined that experiment on the basis of the subject at the time that the experimental intervention occurred; so a tissue slice experiment involving tissues from animals exposed to treatment or control we considered in vivo; while a similar experiment applying drugs directly to the slice we considered to be an in vitro experiment.
Further, there were some checklist items where agreement between outcome assessors was very low – for instance, for the question of whether for in vivo research the difference in variance between groups being compared was less than two fold, the agreement was no better than would be expected by chance alone. We recommend that the development of publication checklists should include an assessment of inter-observer variation by potential users of the checklist for each checklist item; low agreement might indicate that the item should be rephrased or reframed, or that more explanatory text is required.
Finally, our work shows the challenge of assessing even a relatively limited number of publications against a relatively straightforward checklist. We are delighted that so many collaborators (from 6 continents) agreed to participate, and are very grateful to them. However, even with their help the outcome assessment and reconciliation took 17 months. This is too slow to be useful for instance for quality improvement activity, where more rapid feedback would allow more rapid adjustments in response to performance. We have tested the use of text analytics using regular expressions to automatically ascertain reporting of measures to reduce the risk of bias, and for some such risks of bias the approach achieves sensitivities and specificities above 80%. However, for more complex items it may be that machine learning approaches using for instance convoluted neural networks may be more successful, and this is a current focus of our research. We hope that, by making the dataset for this study available, this might be used for instance for distant supervised learning in such systems.
Conclusions
Introduction of a checklist lead to substantial improvements in the quality of reporting in NPG publications that was not seen in matched manuscripts from other publishers, and this improvement appears to be ongoing. However, there is still substantial room for improvement, and this suggests that measures such as mandatory author checklists need to be supplemented by other approaches.
Authorship: the NPQIP consortium
Study steering committee: Malcolm Macleod (University of Edinburgh, Chief Investigator and Chair), Emily
Sena (University of Edinburgh), David Howells (University of Tasmania).
Study management committee: Malcolm Macleod (University of Edinburgh, Chief Investigator and Chair), Emily Sena (University of Edinburgh), David Howells (University of Tasmania), Veronique Kiermer (Nature, until mid 2015), Sowmya Swaminathan (Nature, from mid 2015).
Redaction and identification of publications: Hugh Ash, Rosie Moreland (Imperial College, London)
Authoring and testing of training materials: Cadi Irvine, Paula Grill, Monica Dingwall, Emily Sena, Gillian Currie, Malcolm Macleod (University of Edinburgh)
Programming and data management: Jing Liao, Chris Sena (University of Edinburgh)
Outcome assessors: Paula Grill (272), Monica Dingwall (258), Malcolm Macleod (229), Cadi Irvine (179), Cilene Lino de Oliveira (170), Daniel-Cosmin Marcu (113), Fala Cramond (96), Sulail Rajani (93), Andrew Ying (81), Hanna Vesterinen (31), Roncon Paolo (28), Kaitlyn Hair (26), Marie Soukupova (23), Devon C. Crawford (17), Kimberley Wever (16), Mahajabeen Khatib (16), Ana Antonic (13), Thomas Ottavi (13), Xenios Milidonis (12), Klara Zsofia Gerlei (10), Thomas Barrett (10), Ye Liu (10), Chris Choi (9), Evandro Araújo De-Souza (8), Alexandra Bannach-Brown (8), Peter-Paul Zwetsloot (5), Kasper Jacobsen Kyng (5), Sarah McCann (4), Emily Wheater (4), Aaron Lawson McLean (1), Marco Casscella (1), Alice Carter (1), Privjyot Jheeta (1), Emma Eaton (1).
Reconciliation: Alexandra Bannach-Brown (199), Malcolm Macleod (197), Monica Dingwall (167), Paula Grill (161), Kaitlyn Hair (97), Cilene Lino de Oliveira (40), Sulail Rajani (9), Daniel-Cosmin Marcu (8), Cadi Irvine (3), Fala Cramond (1).
Data analysis: Paula Grill, Jing Liao, Malcolm Macleod
Writing Committee: Malcolm Macleod, David Howells, Jing Liao, Paul Grill, Emily Sena
Disclaimer: the opinions expressed in this article are the authors' own and do not reflect the view of any employing agency including the U.S. National Institutes of Health, the U.S. Department of Health and Human Services, or the United States Government.”
Funding
The study was funded by a grant from the Laura and John Arnold Foundation, who played no role in the design, conduct or analysis of the study or in decisions regarding publication or dissemination.
Role of Nature in data analysis and data ownership
The study dataset belongs to the investigators, and all decisions relating to data analysis and publication were be taken by the steering committee and were independent of Nature. NPG were invited to correct any errors of fact in a draft version of the manuscript.
Acknowledgements
Footnotes
Conflicts of interest: None declared.
Funding Laura and John Arnold Foundation.