Abstract
In the last decade Open Science principles, such as Open Access, study preregistration, use of preprints, making available data and code, and open peer review, have been successfully advocated for and are being slowly adopted in many different research communities. In response to the COVID-19 pandemic many publishers and researchers have sped up their adoption of some of these Open Science practices, sometimes embracing them fully and sometimes partially or in a sub-optimal manner. In this article, we express concerns about the violation of some of the Open Science principles and its potential impact on the quality of research output. We provide evidence of the misuses of these principles at different stages of the scientific process. We call for a wider adoption of Open Science practices in the hope that this work will encourage a broader endorsement of Open Science principles and serve as a reminder that science should always be a rigorous process, reliable and transparent, especially in the context of a pandemic where research findings are being translated into practice even more rapidly. We provide all data and scripts at https://osf.io/renxy/.
Introduction
The COVID-19 outbreak represents an urgent threat to global health. On August 10, 2020, the number of COVID-19 cases had exceeded 19 million and the death toll had exceeded 730,000 worldwide. Many important issues remain unresolved, including some crucial questions around both the diagnosis of patients with COVID-19 and optimal therapeutic strategies. Rapid scientific progress on these issues is needed to improve patient management, reduce mortality, and prevent new infections. The scientific community has responded accordingly, with the publication of over 80,000 preprints and peer-reviewed articles on COVID-19 or SARS-CoV-2 since the World Health Organization (WHO) became aware of the emergence of a new virus on 31st December 2019 [1]. Many of these publications have contributed to the development of a body of knowledge that has since informed practice but a considerable number of these studies suffer methodological weaknesses, limiting the interpretability of their findings [2] or leading to false claims with a potentially dramatic impact on public health. While some of these studies have already been retracted [3, 4], others are still available to the research community and to the public. In addition to the direct threat these publications pose to patient management and public health policies, these low-quality studies also exacerbate the waste of scientific resources [2] that is well-known to plague the scientific system [5]. Furthermore, many news outlets have recently amplified public exposure to low-quality research, sowing confusion among the public. In this paper we argue that many of the sub-optimal and non-transparent scientific practices witnessed during the pandemic, in conjunction with poor coordination across the global research community, have contributed to a dysfunctional scientific process for COVID-19 research. We support this view by providing results from an analysis of COVID-19 publishing data in recent months, including an analysis of reviewing times, conflicts of interests and misuse of non peer-reviewed material. We further argue that the widespread implementation of Open Science principles – known to increase the rigour, reliability and reproducibility of scientific results [6, 7, 8, 9, 10] – could help optimize research efficiency moving forward, and thus improve health outcomes and economic costs related to COVID-19.
Broadly speaking, Open Science aims to optimize scientific conduct and communication by exposing the scientific process, and results thereof, to the scientific community and broader public. This idea is implemented concretely through a number of core Open Science practices [8, 11]: Open Access, Open Source, Open Data and Open Peer-Review. The best-known and endorsed of those, Open Access, consists of making all scholarly communications freely available with full re-use rights. Open Access also encompasses early dissemination of research manuscripts in the form of preprints (articles not yet published in scientific journals). Even though preprints are not yet peer-reviewed and, as such, their scientific validity is not guaranteed, they contribute to a more transparent and open scholarly publication system, accelerating reviewing and communication within the scientific community [12]. The Open Access principle should be adopted together with the Open Source and Open Data principles. Open Source and Open Data practices aim at ensuring that materials such as questionnaires, forms, procedures, collected data, metadata, and source code are shared to foster replication studies, increase data re-use, and facilitate the peer-reviewing process [13]. Indeed, reviewers have the material at hand to verify the findings or detect any issues that could not be otherwise identified from the manuscript itself and to provide comprehensive peer-review reports. Then, following the Open Peer-Review principle, these peer-review reports should be publicly and transparently shared, along with the authors’ response. The scientific discussions between authors and reviewers is inherent to the process of creation of knowledge [14]. In addition, Open Peer-Review helps maintain high reviewing quality [15, 16, 17] and reduces the risk of concealed conflicts of interest. Therefore, the adoption of Open Science principles in the last decade has been particularly helpful in increasing the rigour, reliability and reproducibility of scientific results across research fields [6, 8, 9, 10].
There is evidence to suggest that the COVID-19 pandemic has served as a catalyst in the adoption of certain Open Science principles. For instance, major publishers such as Elsevier (The Lancet, Cell, …) [18] and Springer Nature (Scientific Reports, Nature, …) [19] have made newly written COVID-19 related articles freely accessible to the public (Open Access). Furthermore, authors and particularly medical researchers have shared their preprints more systematically than in previous pandemics [20] and reviews have been posted on external platforms (e.g., Pubpeer [21]). Specific initiatives, such as OpenSAFELY [22], have emerged to make data available to researchers while complying with the legislation regulating the use of medical data. Nevertheless, there have been many instances where these principles were ignored. One notorious example is the lack of transparency and sharing of the data provided by Surgisphere, which led to the retraction of the publication in The Lancet [23]. In other instances, some of the Open Science principles were adopted but misused. For example, news agencies have reported unreliable results based on unreviewed preprints [2] and some open reviews took place on separate platforms (for example Pubpeer), and were thus not directly available to readers.
While we recognize that the faster embracing of Open Science practices during the pandemic is a step towards more accessible and transparent research, we also express concerns about the adoption of these practices for early and non-validated findings. Furthermore, embracing only some of these principles (e.g. preprints), while excluding others (e.g. data sharing) can be more detrimental than not adopting open practices. The aim of the present paper is twofold. First, we identify the issues the scientific community has faced with regard to the publication process since the beginning of the pandemic. To do so we analyzed data collected on preprints and published COVID-19 research articles, as well as on retracted COVID-19 publications, in order to quantify issues related to reviewing time, conflicts of interest, and inappropriate coverage in the media. Second, we discuss how a wider adoption of Open Science principles could have potentially minimized these issues and mitigated their impact on the scientific community and broader public.
The structure of this article follows the stages of the publication process shown in Figure 1. We first discuss issues arising at the data collection and interpretation stage (before the dissemination of the results). Then, we review the dysfunctions observed during the publication process (between the submission and the publication of research articles), before investigating the misuses of research outputs during science communication (after publication). We provide recommendations based on Open Science principles for each stage of the publication process, which we hope will contribute to better research practices in the future.
1 Stage 1: Data collection and interpretation
While previously deplored [5], waste of scientific effort has been particularly prominent during the COVID-19 pandemic [2]. In this section, we show that this waste has its roots in the early stages of the research process – at the data collection and interpretation stage – and discuss how study preregistration, registered reports, adherence to reporting guidelines and Open Source principles could help to minimize waste in research.
1.1 Identified flaws
1.1.1 Methodological and statistical issues
Scientists have raised concerns about methodological flaws in the design and analysis of various COVID-19 pharmacological studies [2]. A famous example, widely criticised by scientists around the world, is the initial preprint from Gautret and colleagues [24] on the use of hydroxychloroquine for the early treatment of COVID-19 patients, which may have slowed down recruiting for methodologically-valid initiatives such as Discovery [25]. Not only are the conclusions of the preprints not supported by the data [26, 27], but the principal investigator – in response to the criticism of the absence of control arm – decided to stand up against the (sic) ‘dictatorship of the methodologists’ [28], which may lead to a neglect of good methodological principles.
To better understand whether inappropriate study designs or statistical analyses contributed to the reasons behind the retraction of articles, we looked at the 29 COVID-19-related papers that had been retracted (or are subject to expressions of concerns) since January 2020 [3]. The list of articles and the results of our analyses are available on the Open Science Framework (OSF) repository of this project: https://osf.io/renxy/. Of the 29 identified publications, 8 (27.6%) were retracted (or had an expression of concern from the editorial board) based on their data analysis or study design. More specifically, among these 8 publications, 2 (25.0%) papers were retracted, at the authors’ request, in order to conduct further data analyses and 6 (75.0%) were retracted because the methodology or the data analysis was wrong.
1.1.2 Duplication of research
Another concern is the increased risk of research waste due to duplication. Many studies that aimed to assess the efficacy of hydroxychloroquine were conducted in parallel: 218 registered trials were ongoing or already completed as of 26th April 2020 [29]. Many comparative effectiveness studies – randomised or not – were conducted without preregistration (e.g., Geleris et al. [30]), however, meaning that the broader research community only became aware of these studies at the time of the release of the results. This illustrates the general lack of cooperation between research teams, putting more patients at risk by exposing them to potentially harmful treatments in multiple underpowered studies, and also leading to a waste of research time and research funding [2]. Other studies have been pre-registered but conducted and reported with major deviations from the preregistration record: for example outcome measures and their timing of assessment reported in the aforementioned study by Gautret et al. [31] were not those listed on the EU Clinical Trials Register.
1.1.3 Ethical concerns
Ethical concerns have also arisen during the pandemic. While the research community needs to find ways to provide timely solutions to the COVID-19 crisis, it should not be at the detriment of good research and clinical practice. Among possible ethical risks, Xafis et al. [32] identified over-recruitment in trials, the conduct of human vaccine studies before the completion of animal studies, and the neglect of adverse effects in drugs studies. An example of the last is the little consideration given to the known cardiotoxicity of the combination of hydroxychloroquine and azithromycin early on in the pandemic. Issues surrounding patients’ participation in clinical studies have also been observed: in Gautret et al. [31], consent was not obtained from patients before they participated in the study or their data were analysed. In addition to the ethical problems this poses, it could also weaken the trust that patients and the broader community afford researchers, with detrimental consequences for public health in the long term.
1.2 Open Science solutions
Here, we argue that the adoption of certain Open Science principles could have helped to detect or avoid the issues in data collection and interpretation described above (subsection 1.1). Two methods seem to be particularly relevant:
1.2.1 Study preregistration
First, study preregistration on dedicated platforms (e.g., clinicaltrial.gov, osf.io, or aspredicted.org), with a thorough description of the study design, ethical approval, methods for data collection and data analysis, can help prevent some of the issues identified above (subsection 1.1). Indeed, study preregistration may reduce the amount of unnecessary duplication of research as researchers will be able to check whether specific studies are ongoing and design theirs to address complementary questions. It also helps to ensure that study protocols have received approval from ethics committees or other relevant bodies, and are conducted in alignment with applicable legislation [33].
Another goal of preregistration is for readers and reviewers to make sure that a published study has been conducted and analysed as planned, thus limiting the risks of changes to the design, methods or outcomes in response to the data obtained. Researchers should register studies prior to data collection. On the platform clinicaltrial.gov, retrospective registrations or updates to the study protocol are flagged. Depending on the level of methodological details in the record, study preregistration may help in limiting questionable research practices such as HARKing [34], p-hacking and p-fishing [7].
However, such preregistrations have two major limitations. First, they do not fully prevent duplication. While replication (defined as a deliberate effort to reproduce a study to validate the findings) is an important step of the research process, duplication (an inadvertent repetition of the research) contributes to research waste. This waste has been noted among COVID-19 research [2], with a strikingly high amount of duplication despite study preregistration. Second, whereas preregistrations allow the detection of questionable research practices, they do not help prevent methodological issues before data collection since the preregistration is not itself peer-reviewed and the statistical analysis section of these records is often very brief. Therefore, study preregistrations are necessary, since they encourage researcher to outline the study design and analysis strategy, but not sufficient to avoid the excessive waste of scientific resources.
1.2.2 Registered report
Peer-reviewed study protocols, also called registered reports [35], can also have a major impact on the reduction of wasted resources. Since protocols are peer-reviewed before the enrollment of participants and data-collection, potential omissions or mistakes in the proposed methodology can be corrected before any substantial resources are used, thereby limiting scientific waste [5, 36]. Registered reports can therefore contribute to higher quality research, with a reduced risk of bias and increased generalizability. These improvements in quality and robustness of scientific evidence [7] ultimately facilitate its communication and use. One disadvantage of registered reports is that their reviewing takes time, while preregistrations are immediately available. However, they both contribute to a better visibility of ongoing research, and should be used at institution level to coordinate research projects at an international level in a more efficient way, in order to optimize resources.
1.3 Section conclusion
In clinical research, the adoption of study preregistration for clinical trials (called in this field clinical trial registrations) and observational studies, as well as registered reports, have contributed to better transparency and reliability of the findings. For the researchers themselves, the availability of the report facilitates complete and transparent reporting in the final publication. Finally, study preregistration and registered reports enable the conduct of more exhaustive meta-research and systematic reviews, by reducing the risk of publication bias. We therefore argue that both study preregistration on dedicated platforms, already mandatory for some types of clinical trials, and registered reports should be used more systematically across research fields.
2 Stage 2: Publication process
The process of publishing scientific evidence is comprised of several essential steps summarized in the second box in Figure 1. In response to the COVID-19 global pandemic, an enormous number of research publications have been produced, both in the form of preprints and peer-reviewed articles. Unfortunately shortcuts have been taken in the publication process of some of these papers, jeopardizing the integrity of the editorial process and placing the rigour of scientific publications at risk. In this section we discuss three issues in the peer-reviewed publication process which have arisen during the COVID-19 pandemic: fast-track publication, conflicts of interest and lack of data sharing. We highlight the increasing retraction rate of COVID-19 preprints and peer-reviewed articles, and we propose practical solutions to minimize these issues in the future.
2.1 Identified flaws
2.1.1 Expedite reviewing and conflicts of interests
The publication pipeline has been directly affected by COVID-19, in particular with respect to reviewing times. A recent preprint [37] analyzing 14 different publishing venues highlights an important decrease in the reviewing time for publications related to COVID-19. Although fast-tracking particular articles is not uncommon in the scientific publishing system [37], a number of journals have recently implemented specific policies to fast-track COVID-19-related research (e.g., PLOS, some Wiley journals [38], some Elsevier Journals [39], some SAGE journals [40], and PeerJ journals [41]). In addition a new overlay journal for fast and independent reviews of COVID-19 preprints has recently been launched [42]. While faster peer-reviewing does not necessarily equate with poorer review quality, it remains unclear how thorough the peer-reviewing is and how potential conflicts of interest are handled. Palayew et al. [37] recently highlighted that COVID-19-related manuscripts between 1 January 2020 and 23 April 2020 had a median reviewing time of 6 days and that more than a thousand manuscripts were reviewed in less than 7 days. In the same study the authors also identified manuscripts for which it was unclear whether they had been reviewed at all. It should be noted that the FASEB Journal allows editors to directly accept COVID-19-related submissions for Review, Perspectives, and Hypotheses, without peer-review, as per the journal’s fast-track policy [43]. In light of this, we have sought further information on the fast-tracking of peer-reviews with up-to-date information.
We searched for “COVID-19”(and related terms, see the full list in Appendix) on PubMed Central and found 12,682 published articles. Of these we could extract the reviewing time for 8,455 (66.7%) articles, as the difference between the date of submission and the date of acceptance. Of these 8,455 publications, 700 (8.3%) from 341 different peer-reviewed journals were reviewed and accepted for publication either on the day of submission (n = 311) or the day after (n = 389). Furthermore, we manually inspected these manuscripts to identify potential conflicts of interest. We focused only on editorial conflicts of interest, i.e., manuscripts for which at least one of the authors is an editor in chief, associate editor or a member of the editorial board of the journal in which the article was published. We did not assess institutional or financial conflicts of interest. An editorial conflict of interest was observed in 298 (42.6%) articles. To further investigate the distribution of these conflicts, we divided the articles into three groups: research articles (n = 503), short communications (n = 74), and editorials (n = 123). The proportion and type of conflicts of interest per type of article are presented in figure 2.
As expected, conflicts of interest were most common for editorials, but were also surprisingly frequent for research articles (n = 207, 41.2%). The prevalence of these conflicts was substantially heterogeneous across journals: the estimated intraclass correlation coefficient for the proportion of publications with any conflict of interest was 0.41, (95% confidence interval: [0.32; 0.50]), which means that 41% of the variability observed in the occurrence of conflict of interest can be explained by the journal in which the articles were published. Furthermore, among the 87 journals that published at least 2 COVID-19 related research articles with an acceptance time 1 day, 31 (35.6%) showed evidence of potential conflicts of interest for more than half of the articles. These findings raise concerns about the fairness and transparency of the peer-review process with such short acceptance times.
While the need for faster scientific dissemination during a pandemic is understandable, the possibility to publish without a rigorous and critical peer-review process is detrimental to the scientific community and the public at large. The work of Gautret et al., which first appeared on MedRxiv as a preprint [24], was subsequently accepted for publication a day after submission by the International Journal of Antimicrobial Agents [31]. Following concerns about the scientific validity of these findings, the International Society of Antimicrobial Chemotherapy (ISAC) – the learned society that produces the journal in collaboration with Elsevier – commissioned post-publication reviews which were made available 4 months after the initial publication [44, 45]. One of these reviews concluded: “this study suffers from major methodological shortcomings which make it nearly if not completely uninformative. Hence, the tone of the report, in presenting this as evidence of an effect of hydroxychloroquine and even recommending its use, is not only unfounded, but, given the desperate demand for a treatment for COVID-19, coupled with the potentially serious side-effects of hydroxychloroquine, fully irresponsible.” Despite this statement, the paper was not retracted, on the grounds that it gives room for scientific debate [46]. This is problematic for several reasons. First, a paper with dangerous conclusions is still available to the researchers, with no mention within the article of the existence of the post-publication reviews as a warning. Secondly, this study is likely to be included in any subsequent systematic review done on the topic, and, even though it may be flagged as a study with a high risk of bias, it might influence the results of the meta-analysis. Finally, the choice of the editorial board not to withdraw this publication, but instead encourage the submission of letters or comments, will help increase the impact of the journal, despite the poor quality of the original publication and associated review process.
2.1.2 Distrust of published results
In 2016, a survey administered to more than 1500 researchers, identified a phenomenon that had later been called the ‘reproducibility crisis’ [47]. In this survey, more than half of the respondents reported that they had experienced trouble reproducing published results (including their own) at least once, raising doubts as to the quality of these published results. This issue has two main causes: the lack of data sharing and the lack of code sharing.
At the heart of biomedical research lies the need for high-quality data. From these data, using a suitable methodology and statistical tools, researchers can answer relevant clinical questions. While privacy issues raised by collecting medical data should be addressed, there is a need for data sharing between researchers to reproduce the results, enhance collaboration and obtain more timely results. Achieving both of these goals, however, remains a challenging task for the scientific community, especially during an emergency.
To date, four peer-reviewed articles related to COVID-19 have been retracted shortly after publication due to concerns about potential data fabrication or falsification. Two retracted articles, in particular, attracted a great deal of attention: one in the New England Journal of Medicine [48] and one in The Lancet [23]. These articles reported the findings of an international study, based on data owned by the company Surgisphere. The data were not publicly shared at the time of submission or after publication but, even more disturbing, the data were not shared with all the co-authors. The initial publication of the study in The Lancet [23] indicated that hydroxychloroquine increased mortality for COVID-19 patients. These serious safety concerns led the World Health Organization and INSERM to interrupt the inclusions in the hydroxychloroquine arm of the Solidarity and DisCoVeRy trials while the clinical data of patients treated with hydroxychloroquine were reviewed. Our review of the retracted COVID-19 papers highlights that two additional papers using Surgisphere data have also been retracted. Surgisphere’s refusal to share data with the scientific community, including authors involved in the study, and even to a third trusted party, eventually led to the retraction of both articles. The most striking consequence of this affair, however, is that it may have made scientists, editors, readers, organizations and reviewers waste precious research time, when a rapid response is needed.
2.2 Open Science solutions
2.2.1 Open Review
As previously stated, we understand and acknowledge the pressure and the need to accelerate reviewing of submitted manuscripts, but journals and editors should carefully consider the trade-off between reviewing quality and reviewing time. Our findings relating to the fast-tracking of peer-reviews for COVID-19-related articles in subsubsection 2.1.1 can, however, be perceived as particularly worrying. In a time of a pandemic, medical management of patients and public health policies rely heavily on scientific findings. Fast-tracking of peer-review should therefore only be done when scientific rigour can be maintained as its loss might lead to disastrous consequences for patients and for public health as a whole. A greater transparency in the peer-review process is thus urgently needed. Sharing reviewers’ reports along with the authors’ response more systematically could contribute to this transparency. These scientific discussions are extremely valuable as they may help balancing the views expressed in the published article. In addition, reviews are usually found to be of higher quality when they are made publicly available [15, 16, 17]. The availability of the reviews may also help the scientific community and the different stakeholders to verify that the peer-review of a manuscript has been thorough and so help to increase public trust in scientific research.
Furthermore, the peer review process should evaluate all aspects of the publications, in particular the methodology, in order to identify inappropriate study designs or incorrect use of statistical methods (see subsection 1.1). In order to do so some journals have decided to resort to specialist statistical reviewers. The British Medical Journal’s editorial board, for instance, includes a Chief Statistical Advisor and Statistics Editors [49]. The reviewing process is conducted by domain experts and statisticians in order to make sure that the claims or findings hold with respect to the conducted statistical analysis but also to ensure that the statistical analysis conducted by the authors is appropriate and correct. Statistical reviews help in the detection of mistakes in the data analysis and in detecting exaggerated claims in a manuscript before publication. A recent review of retracted research articles showed that among the 429 papers that were retracted from journals in which statistics are assessed, 81 (18.9%) had a specific statistical review, whereas the assessment of statistics was part of the reviewer or editor’s task in the remaining 348 (82.1%) [50]. Although the retraction rate was lower among articles involving a specialist statistical reviewer, the low rate or retraction prevents us from drawing any strong conclusions (5 per 10,0000 vs 7 per 10,000).
To better anticipate, detect, and act upon the potential issues raised in subsubsection 2.1.1 about fast-tracking or conflicts of interest, but also to detect studies that do not reach the expected standards for research, we believe that complete transparency in and around the peer-reviewing process should be adopted. In light of the lessons learned during the current pandemic we therefore recommend the following to be adopted by journals and editors:
Authors should highlight, both in their manuscript and its metadata, any affiliations with the editorial board of the journal to which they submit their paper. An example of such disclosure can be seen in Pardo et al. [51].
Journals should explicitly state how the peer review was conducted: they should state how many referees were recruited and the duration of the complete review process. This should include how long it took to find referees, how long each referee took to complete the review (time between submission and acceptance of the paper) and the number of iterations between the reviewers and the authors.
In addition, journals should make the referees’ reports of accepted articles transparently available alongside the manuscript allowing authors, reviewers and the scientific community to benefit from the constructive comments in these Open Reviews. Journals should probably also consider the various Open Review processes [15] and let referees themselves decide whether or not they want to sign their reviews.
When publications report quantitative findings, a systematic review of statistical methodology should be included.
Although the peer-review process implemented by journals contributes greatly to the quality of manuscripts, each submission is typically reviewed by a limited number of reviewers whose skills might be too specific to evaluate the different components of a specific manuscript. Open peer-reviews - reviews that are not ‘on request’ but carried out spontaneously, whether on preprints or post-publication - are complementary to the journal’s peer-reviews. The use of open peer-reviews through dedicated platforms such as Pubpeer [21] or Zenodo [52], has gained popularity during the pandemic. For instance, a team of statisticians has published a detailed and comprehensive review of the preliminary report of the RECOVERY trial: the largest comparative effectiveness study on COVID-19 treatments to date [53]. Further examples are presented here [54]. It is notable that the reasons for the retraction of 5 COVID-19 papers were echoed on the Pubpeer link to those publications. Therefore, open reviews can contribute to an early detection of flaws in research articles. However, providing a thorough feedback on publications is time-consuming and reviewing activities are usually not highly valued by institutions, nor strongly encouraged. Therefore, these changes in peer-review practices cannot be implemented without meaningful support and endorsement from research institutions. Many institutions have created repositories of accepted manuscripts to promote Open Research. These initiatives should go a step further by encouraging researchers to archive the reviewers’ reports as well.
We postulate that adopting Open Reviews and complete transparency in the reviewing process, in addition to all the already highlighted benefits identified in the literature [15, 55, 16, 17], would have helped in detecting potential mistakes in manuscripts or frauds and saved precious research time during the pandemic. Furthermore, it would make analyses of peer-review processes and their quality easier to conduct [56] and would accelerate the training of Early Career Researchers [56, 57] whose help might be critical during a public health crisis.
2.2.2 Open Data and Open Source
To aid a more efficient and thorough peer-review, enable data re-use, and facilitate reproducibility, researchers are now being strongly encouraged to share their data on publicly accessible repositories [58, 59], along with the code used for the analysis. Even though sharing raw data may be challenging in medical research due to the need for compliance with data protection regulation, Open Data initiatives have been considered one of the main solutions to avoid a replication crisis [60, 13] and are often seen today as a critical part of the peer-review system [13].
Data sharing, including anonymized raw clinical data, is crucial during a pandemic and could accelerate the understanding of new diseases and the development of effective treatments [61]. Several researchers have thus, early-on, argued that journals and institutions alike should always ask authors of manuscripts to confirm that the raw data are available for inspection upon request (or to explain why they are not) [62, 63]. Considering that data fabrication and/or falsification has been observed in the scientific community and its frequency probably underestimated [64], some have suggested [13] that the policy of “sharing data upon request” is not enough. Therefore, recent years have seen an increase in policies, from journals and institutions alike, asking researchers to share their raw data by default, except when data could not be shared for privacy or copyright reasons [60, 13]. While data sharing is increasingly being adopted, it still does not appear to be a default practice, even when there are no privacy or legal concerns with the data acquired during the research project [13].
Another approach to the reproducibility crisis is by making the source code used for the analysis openly available to the public and the scientific community [65]. Solutions have been proposed by the research community to promote and facilitate data sharing. An open code repository has been created [66] and many scientific journals are now asking researchers to publish the source code alongside their findings. The use of source code sharing platforms, such as GitHub, is becoming common and has even been advised to improve open science behaviour [67, 68, 69]. The lack of code sharing is still, however, a major issue in the biomedical literature. In times of crisis, such as the COVID-19 pandemic, the need for open code is even more crucial when scientific evidence is at the root of major political decisions.
One may argue that the four retracted papers relying on Surgisphere data would not have been published if proper data sharing policies were in place for all journals. The policies currently in place, while already effective in reducing scientific fraud [13, 62, 63, 70, 71], are not sufficient to detect the issues that the Surgisphere papers have raised. Indeed, it is currently considered acceptable for authors to state that they cannot share their data (or code) for legal reasons (e.g., copyright, privacy). We therefore urge that data-sharing policies should be adapted to the following:
Data should be shared by default: authors should not be able to submit a manuscript if they do not provide access to raw data and analysis scripts or a valid reason why they think it is not feasible. This is already the process used by some journals: Wiley has clear data sharing policies, and data sharing is mandatory for submission to some of its journals (e.g., Ecology and Evolution) [72].
If authors are not able to share their raw data, journal editors should be able and should strive to demand that raw data be examined by a trusted third party (not the authors’ institution) to establish the existence of the raw data and validate the results of the analysis presented by the authors. Identification of the trusted third party can be left to the discretion of the ethics committee or the proprietary company but should not present any conflict of interest with the authors of the manuscript. The trusted third party should produce a public and signed report stating that the data are available and that the results presented in the manuscript can be confirmed.
To facilitate meta-analyses, abstracts of all manuscripts should contain links to preregistration numbers, data repositories and open source repositories.
While we understand that changing data-sharing policies will take time, we hope that the scientific community, data-holding companies and governmental agencies protecting rights to privacy will learn the lessons from this pandemic and consider adapting their policies to the aforementioned points.
3 Stage 3: Science Communication
Once a manuscript is published, it is available for the rest of the scientific community to cite or conduct metastudies on, but also to be communicated on by the media or to support policy makers. In this section, we review some of the issues that the pandemic has highlighted with respect to media coverage of scientific results following the publication of preprints and peer-reviewed articles, and we propose alternatives for a more responsible communication of scientific findings.
3.1 A surge of preprints and their misuse
Preprints are articles that authors make available to the community before they have been peer-reviewed, usually around the time of submission to a peer-reviewed journal. There are several benefits of preprints. They allow the communication of new findings to the research community in a more timely manner [73, 12], especially in the context of an emergency such as the current COVID-19 pandemic. Although these results should be critically assessed by the readers, having not yet received the benefit of peer-review, they might encourage the conduct of replication studies or further research building on these findings. Preprints also contribute to the reduction in wasted research by signaling ongoing research projects, avoiding duplication and potentially fostering collaborations. Preprints may also increase a researchers’ visibility and help in credit attribution and priority of DisCoVeRy [12, 74], reducing the risk of plagiarism. In addition, they are an opportunity for authors to get early feedback from a wider range of researchers and incorporate the suggestions received to enhance the quality of the publication [12, 74].
The COVID-19 pandemic has seen a surge in the numbers of preprints submitted by researchers [75]. While 807 preprints were deposited on MedRxiv in the six-month period between 1st July 2019 and 31st December 2019, 6,771 preprints were submitted in the next six months (between 1st January 2020 and 30st June 2020), an increase of 739%. These figures are, respectively, 15,838 and 21,804 for BioRxiv (38% increase) and 87,942 and 112,197 for ArXiv (28% increase). The use of preprints during outbreaks is certainly not new: a systematic review identified the publication of 174 and 75 preprints during the Ebola and Zika virus outbreaks, respectively [76]. Nevertheless, these figures are much smaller than the number of preprints submitted in the first 6 months of the COVID-19 pandemic. Consequently platforms offering preprint hosting have had to rapidly adapt to the rate of submission and adjust their screening procedures for each submission to avoid the dissemination of misleading or blatantly false claims [75].
Although the surge in preprints during the pandemic can be seen as an encouraging step towards Open Research, preprints – by their very nature – contain unreviewed findings and should be interpreted with caution. Unfortunately, some COVID-19 preprints have been misused, notably regarding communication with the general public. Indeed the news media, and also some scientists, have used these non peer-reviewed articles as scientific evidence, increasing the impact of invalidated findings. One of the benefits of preprints is to receive feedback from other researchers, which helps to identify and correct potential flaws in the methodology, analysis or reporting, thus enhancing the quality of the article. As such, preprints may contain inaccuracies or unreliable findings and it must be noted that many preprints are never accepted for publication in peer-reviewed journals. In a systematic review of preprints during the Ebola and Zika outbreaks [76], only 48% of the Zika preprints and 60% of the Ebola preprints could be matched with peer-reviewed publications which later appeared. Although we cannot exclude the possibility that the authors never submitted their preprints to a peer-reviewed journal, a potential explanation for this phenomenon could be the presence of concerns expressed by the community on the preprints, highlighting poor methodology or other flaws that rendered the preprints unsuitable for publication. Unfortunately, during the COVID-19 pandemic, some preprints containing severe methodological flaws – such as the study by Gautret et al. [24] – have been widely disseminated, sowing confusion among the public and the scientific community.
In order to estimate how often preprints platforms were mentioned in the media, we queried the Factiva news database using the three major platforms hosting COVID-19-related research manuscripts (ArXiv, MedRxiv and BioRxiv). On the 13th of July 2020 the three platforms had been mentioned a total of 3,288 times in online news media and in 313 blog posts since 1st January 2020, the day after China alerted the WHO about a newly identified virus. Looking at English-only shares, we find 2,193 web news items reporting scientific findings from recent preprints and 121 articles addressing the surge of preprints during the pandemic and addressing its challenges, thus already clearly showing that preprints are, perhaps worryingly, used to communicate to the public.
Next, to quantify the extent to which preprints themselves were shared in the media (news media and social media), we conducted a systematic search for preprints submitted to ArXiv, MedRxiv and BioRxiv between the 1st January 2020 and the 30th June 2020. We then used the altmetric API to determine the number of media shares as of the 8th July 2020. For comparison purposes, we performed the same analysis on non-COVID preprints submitted to ArXiv during the same time window. Finally, we performed the same analysis on retracted COVID-19 articles or preprints. The methodology is described in an appendix to this paper, and all the scripts for data extraction and analysis, along with collected data, are available on the OSF repository of the project.
As can be seen in Figure 3, ArXiv preprints related to COVID-19 (n = 1, 462) were shared more often than preprints on other topics (n = 80, 786) submitted during the same period. The difference was more pronounced for mentions in the news media: whereas 1,066 (1.3%) of non-COVID-19 preprints were mentioned in the news, 156 (10.7%) of COVID-19 preprints were. The total number of citations (all sources combined) was also larger for COVID-19 preprints, with a median of 6 ([Q1 Q3] = [2 — 15]) mentions versus 2 ([1 — 5]) for preprints on other research topics. The number of total citations in the media was even higher for preprints available on platforms dedicated to biomedical research. Out of 1,208 COVID-19 preprints found on BioRxiv, the median number of citations was 30 ([15 — 85]) and 444 (36.8%) of these preprints were mentioned at least once in the news media. Similarly, out of 4,629 COVID-19 preprints submitted to MedRxiv, the median number of citations was 12 ([6 — 36]) and 1,124 (24.3%) of these preprints were mentioned at least once in the news media. These findings highlight the increasing trend in preprint sharing during the pandemic, raising concerns about the spread of potentially misleading and unverified data.
Openly sharing results that have not yet been peer-reviewed can be very damaging if the media and the public take these findings at face value. An outstanding example is the current debate on the effectiveness of hydroxychloroquine as a treatment for COVID-19, which started after the publication of a methodologically flawed preprint by Gautret and colleagues [24]. Widely shared (1,458 shares in the media including 54 in the news (as of 13th July 2020)), this study quickly caught the attention of the public, creating a high-demand for a treatment that has not been proven beneficial and has many potential harmful side-effects [77]. Moreover, the misuse of preprints may discourage researchers from sharing their own in the future, particularly in research areas in which the use of preprints is a quite recent phenomenon, as in medical research (MedRxiv was launched in June 2019). Finally, our analysis of retracted papers showed that, among the 6 retracted preprints, the median total of shares was 723 [49 — 2488], emphasizing further the misuse of unverified and, ultimately, invalid findings.
3.2 A Call for more reasonable communication
Many science journalists and news editors rely extensively on press releases from institutions [78]. Academic press releases have already been in the spotlight for their impact on the dissemination of exaggerated findings in the news (e.g., [79, 80]) and can therefore directly exacerbate the spread of non-peer-reviewed findings if their communications are based on preprints. Concerns have already been raised about the role of preprints in the communication of science to the public and its potential dangers [81]. Nonetheless, the advantages of preprints for scientific communities are too important to completely give up on preprints [74] and preprints platforms have already implemented warning messages on manuscripts to explain that they have not been peer-reviewed.
An obvious solution to the issue would be to recommend that press releases from research institutions should be made only with respect to peer-reviewed studies, and should be written in collaboration with independent scientists. However, in some cases, it is necessary to report the findings of studies that have not yet been peer-reviewed, if they are expected to benefit the public at large. Should they do so, journalists and news editors must then be encouraged to search for potential conflicts of interest and check the availability of registered information and external peer-reviews to ensure the quality and trustworthiness of their article. Findings from preprints should be communicated with particular caution. Despite the advantages of making headlines simple or even exaggerated [82], journalists and news editors should make sure to accurately convey the inherent degree of uncertainty in scientific studies. However, such measures are clearly not enough: with the increased productivity pressure on science journalists the fact-checking process needs to be sometimes less thorough [78]. While it seems sometimes difficult [78], journalists and scientists should work with each other, in particular for scientific results pertaining to public health or that could imply a change of behaviour of the public.
The misuse of preprints by some journalists emphasises the need for high quality journalism training. This issue is not new and has already been pointed out [83, 84, 85], for example during the Ebola crisis [86]. It is clear that a high quality dissemination of scientific information is essential to an appropriate public health response to a crisis such as COVID-19 [87]. The scientific community has addressed this issue by publishing some guidelines for a better dissemination of scientific news to the public [84] and also by fostering bridges between the scientific community and science journalists through exchanges and training [83]. As an example, the French association of science journalists, the Association des journalistes scientifiques de la presse d’information, has launched and funded an exchange program between researchers and journalists. In the UK, the Science Media Centre provides support to news reporters to help them to accurately interpret new findings from publications or press releases [88], by ensuring that journalists, scientists and statisticians work together. Training in science journalism in times of crisis is even more essential and has previously been attempted during the Ebola crisis [86]. During the COVID-19 crisis, the United Nations Educational, Scientific and Cultural Organisation (UNESCO) has organized a series of ‘webinars’ tackling the importance of journalists’ scientific literacy [89]. However, this process is still ongoing and – since many examples of problematic dissemination of scientific news are still seen – some researchers have proposed indicators to evaluate the scientific accuracy of news articles [90].
Discussion
In addition to previous concerns and investigations of the disruption that the pandemic has caused for research [2], we have found strong evidence of how COVID-19 has impacted science and scientists on several levels. Firstly, we have highlighted the striking scientific waste due to issues in study designs or data analysis. Secondly, we have found evidence of the misuse of preprints in news reports which seem to refer to non-peer-reviewed manuscripts as reliable sources. Thirdly, we have found that the fast-tracking of peer-reviews on COVID-19 manuscripts, which was needed to give vital treatment directives to health authorities as quickly as possible, led to potentially suspicious peer-reviewing times often combined with editorial conflicts of interest and a lack of transparency of the reviewing process. Finally, we highlighted that the lack of raw-data sharing or third-party reviewing has led to the retraction of four major papers and had a direct impact on the study design and conduct of international trials.
The Open Science movement promotes more transparency and fairness in the access to scientific communication, the production of scientific knowledge and its communication and evaluation. Looking at the number of publishers removing their paywalls on COVID-19 related research, one might argue that the COVID-19 pandemic has been a catalyst in the adoption of Open Science principles. However, the aforementioned issues paint a more complicated story. The urgency of the situation has led to a partial Open Access policy but with a very opaque peer-review process coupled with a misuse of preprints and raw-data-sharing policies not being enforced. Full adoption of Open Science principles could, however, have saved precious research resources: open peer review would have helped in the detection of the editorial conflicts of interest and made it apparent whether manuscripts were thoroughly reviewed; adoption of registered reports would have strengthened study designs and data analysis plans; proper and monitored use of preprints would have helped the communication of early results between researchers; strengthening of the policies of raw-data sharing or reviewing could have prevented the Surgisphere scandal; and full Open Access might have accelerated the search for solutions to the pandemic both in medical and socio-economic contexts. In addition to this, statistics reviews could have helped to make studies and their results more robust and limit the impact of exaggeration or misinterpretation of results.
It remains, however, that these principles are not enough. The pandemic has highlighted other issues that Open Science cannot solve. For instance, the misuse of preprints by journalists probably stems from the fact that many journalists may not be trained to understand and navigate the complex academic publication system, and some journalist may be seeking sensationalist news headlines. The pandemic has also highlighted the already-existing science-literacy issue. Finally, we cannot exclude that some of the misuses and abuses that we have highlighted are a direct result of the current metric-centered evaluation of research and researchers which has already been shown to lead to questionable research practices in the past and has been the subject of criticism from scientists for decades [36, 91, 92].
We, as scientific researchers attached to transparency and fairness in the production, communication, use and evaluation of scientific knowledge, hope that this manuscript successfully argues and promotes a faster adoption of all Open Science principles. We therefore call upon readers of this manuscript to co-sign it, should they agree with our recommendations, through the following link http://tiny.cc/cosigningpandemicopen.
Contributions
L.B. and C.L. conceived the study.
L.B., C.L, M.D., H.J., and C.S. extracted the data and performed the analyses.
L.B., N.P.-S., C.Se., P.M., C.Sm., M.D, and C.L. wrote the manuscript.
Funding and Conflict of Interest
The authors received no specific funding for this work. CL is supported by the UK Medical Research Council (Skills Development Fellowship MR/T032448/1).
Acknowledgments
We are grateful to Eric Billy and Marlon Sidore for fruitful discussions. We also want to thank Matthew Cooper for his helpful feedback on the manuscript.
Appendix
Factiva analysis
The Factiva analysis, to find occurences of preprints in the news, was done between January 1st 2020 and July 13th 2020 with the following query:
(‘‘https://arxiv.org ‘‘OR ‘‘https://www.biorxiv.org ‘‘OR ‘‘https://www.medrxiv.org ‘‘OR ‘‘p r e p r i n t ‘‘OR ‘‘pre - print ‘‘)
AND
(‘‘COVID-19 ‘‘OR ‘‘Coronavirus ‘‘OR ‘‘COVID19 ‘‘OR ‘‘COVID 1 9 ‘‘)
Altmetric analysis
To further analyse if COVID-19 preprints were used in the news more than preprints are regularly used we conducted an altmetric analysis of all COVID-19 preprints found on arxiv.org, medrxiv.org, and biorxiv.org. We first downloaded all the COVID19-related preprints from these three platforms from the 1st of January 2020 to the 30th of June 2020, as well as all preprints from arxiv.org in the same period (to serve as a control group).
Duplicates were removed. We then queried the altmetric API for each of these preprints using a Python script to process all entries, find their DOI and query Altmetric with the following command:
#i f the paper i s not from a r x i v re que st s. get (‘https://api.altmetric.com/v1/doi/entry_DOI‘) #i f the paper i s from a r x i v re que st s. get (‘https://api.altmetric.com/v1/arxiv/entry_DOI‘)Analysis codes are available on the GitHub repository of the project: https://github.com/lonnibesancon/OpenSciencePandemic
PubMed Central analysis
To extract the reviewing times, the metadata of 12,682 COVID-19 articles were downloaded on July 7, 2020 from PubMed Central using the query:
“COVID-19”[abstract] OR “COVID-2019”[abstract] OR “severe acute respiratory syndrome coronavirus 2”[Supplementary Concept] OR “severe acute respiratory syndrome coronavirus 2”[abstract] OR “2019-nCoV”[abstract] OR “SARS-CoV-2”[abstract] OR “2019nCoV”[abstract] OR ((“Wuhan”[abstract] AND (“coronavirus”[MeSH Terms] OR “coronavirus”[abstract])) AND (2019/12[PDAT] OR 2020[PDAT])) The reviewing times were extracted from the data using a MATLAB script, available on the OSF repository.
Footnotes
↵* lonni.besancon{at}gmail.com
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].↵
- [67].↵
- [68].↵
- [69].↵
- [70].↵
- [71].↵
- [72].↵
- [73].↵
- [74].↵
- [75].↵
- [76].↵
- [77].↵
- [78].↵
- [79].↵
- [80].↵
- [81].↵
- [82].↵
- [83].↵
- [84].↵
- [85].↵
- [86].↵
- [87].↵
- [88].↵
- [89].↵
- [90].↵
- [91].↵
- [92].↵