Abstract
We conducted an international cross-sectional survey of biomedical researchers’ perspectives on the reproducibility of research. This study builds on a widely cited 2016 survey on reproducibility, and provides a biomedical-specific and contemporary perspective on reproducibility. To sample the community, we randomly selected 400 journals indexed in MEDLINE, from which we extracted the author names and e-mails from all articles published between October 1, 2020 and October 1, 2021. We invited participants to complete an anonymous online survey which collected basic demographic information, perceptions about a reproducibility crisis, perceived causes of irreproducibility of research results, experience conducting replication studies, and knowledge of funding and training for research on reproducibility. A total of 1924 participants accessed our survey, of which 1630 provided useable responses (response rate 7% of 23,234). Key findings include that 72% of participants agreed there was a reproducibility crisis in biomedicine, with 27% of participants indicating the crisis was ‘significant’. The leading perceived cause of irreproducibility was a ‘pressure to publish’ with 62% of participants indicating it ‘always’ or ‘very often’ contributes. About half of the participants (54%) had run a replication of their own previously published study while slightly more (57%) had run a replication of another researcher’s study. Just 16% of participants indicated their institution had established procedures to enhance the reproducibility of biomedical research; and 67% felt their institution valued new research over replication studies. Participants also reported few opportunities to obtain funding to attempt to reproduce a study and 83% perceived it would be harder to do so than to get funding to do a novel study. Our results may be used to guide training and interventions to improve research reproducibility and to monitor rates of reproducibility over time. The findings are also relevant to policy makers and academic leadership looking to create incentives and research cultures that support reproducibility and value research quality.
Introduction
There is growing interest in both the reproducibility of research and around ways to enhance research transparency1–4. Terminology around reproducibility varies5; here we define reproducibility as re-doing a study using similar methods and obtaining findings consistent with the original study; and as irreproducible when the findings are not confirmed. This definition allows for variation in methods between the original study and the reproducibility study. Reproducibility of research is core to maintaining research trustworthiness and to fostering translation and progressive discovery. Despite the seemingly critical role of reproducibility of research and growing discussions surrounding reproducibility, the reality is that most studies, including pivotal studies within several disciplines, have never been formally subjected to a reproducibility effort. For example, in education research, an analysis of publications in the field’s top 100 journals showed that just 0.13% publications (221 out of 164,589) described reproducibility projects6. In psychology, a study examining 250 articles published between 2014-2017 found that 5% described a replication7, while a similar study examining replication rates in social sciences found that just 1% of articles sampled described a replication8. Across disciplines, our knowledge about the proportion of studies that are reproducible tends to be dominated by a small number of large-scale reproducibility projects. In psychology, for example, a study which estimated the replicability of 100 foundational studies in top journals in the field reported that only 36% had statistically significant results (one measure of replicability), compared to 97% of the original studies9.
In 2016, Nature conducted a survey of more than 1,500 researchers about their perceptions of reproducibility. They found that 83% agreed there was a reproducibility crisis in science, with 52% indicating that they felt the crisis was ‘significant’10. Survey studies like this play a powerful role in elucidating the determinants of reproducibility. Such information is essential to identify gaps in factors including training, research support, and incentives to ensure reproducible research. Given the global nature of research, capturing global perspectives as was done in the Nature survey is crucial to obtaining broad understanding of issues across the research ecosystem.
In this study, we aim to build on the 2016 Nature survey on reproducibility by surveying researchers in the biomedical community specifically. There is immediate importance to ensuring biomedical research is reproducible: we know that in biomedicine, studies that were subsequently not reproducible may have led to patient harms11,12. By capturing a diverse and global group of biomedical researchers’ perceptions of reproducibility within the field we hope to better understand how to ensure reproducibility in biomedicine. Our specific objectives were to: 1) explore biomedical researchers’ perceptions of reproducibility and their perceptions of causes of irreproducibility; and 2) describe biomedical researchers’ experiences conducting and publishing reproducibility projects. The study was descriptive, we have no formal hypotheses. To address our objectives, we used an adapted version of the 2016 Nature reproducibility survey; this allows for a comparison of results between the two cross-sectional studies.
Methods
Open science statement
This study received ethics approval from the Ottawa Health Sciences Research Ethics Board. This study protocol was registered a priori, and data and materials have been made available: https://osf.io/3ksvz/. 13
Study design
We conducted an online cross-sectional closed survey of researchers who published a paper in a journal indexed in MEDLINE. The survey was anonymous.
Sampling framework
We downloaded the MEDLINE database of journals. From this list of approximately 30,000 journals we selected a random sample of 400 journals using the RAND() function in Excel. We then extracted the author names and e-mails from all articles published in those journals between 1 Oct 2020 to 1 Oct 2021. We included all authors whose names and emails were available and all article types/study designs. For full details on our semi-automated approach to extracting author emails please see our search strategy in S1.
Participant Recruitment
The survey was sent only to those researchers identified via our sampling procedure (i.e., a closed survey). Potential participants received an email containing a recruitment script which detailed the purpose of the study and invited them to review our informed consent form and complete our anonymous online survey. Participation in the survey served as implied consent; we did not require signed consent to maintain anonymity. To send emails to the sample of authors we used Mail Merge software. This tool allows for the personalization of emails without having to individually customize and send each out. In the case of non-response, we sent three reminder emails to potential participants at weekly intervals after the initial invitation. We closed the survey 4 weeks after the initial invitation. We did not provide any specific incentive to complete the survey.
Survey
The full survey is available in S2. The survey was administered using SurveyMonkey. The survey contained four demographic questions about the participants, including their gender, research role, research area, and country of residence. Participants were asked to complete questions about their perceptions of reproducibility in biomedicine, questions about their experience with reproducibility, and questions about perceptions of barriers and facilitators to conducting reproducibility projects. The survey used adaptive formatting to present only certain items based on the participant’s response to previous questions. Most questions were multiple choice, with two questions asking participants to expand on their responses using a free-text box. The survey was purpose-built for the study by the research team, building directly off the previously published Nature reproducibility survey10. We included several of the previous study’s questions directly in this study, modified some slightly, and made some more specific to the biomedical research setting. This approach allows us to compare our results to the original study. We also introduced some novel questions on reproducibility. The survey was pilot tested by three researchers to ensure clarity and acceptability of format and we edited the survey to address their feedback. Participants were able to skip any question.
Data Management and Analysis
Data were exported from SurveyMonkey and analyzed using SPSS 28. We report descriptive statistics including count and percentages for all quantitative items. For the qualitative items, we conducted a thematic content analysis. To do so, two researchers individually read all text-based responses and assigned a code to summarize the content of the text. Codes were refined iteratively upon exposure to each text-based response read. After discussion to reach consensus on the codes used, we then grouped the agreed codes into themes for reporting in tables.
Results
Protocol amendments
In our original protocol we said we would take a random sample of 1000 journals from MEDLINE, and extract information from the first 20 authors. This approach required extensive manual extraction so we opted to restrict our random sample to 400 journals and semi-automate extraction of author information for an entire year’s worth of publications. Our revised method meant that we obtained and used listed emails from all authors on an identified paper (i.e., we were not restricted to corresponding authors).
Demographics
A total of 24,614 emails were sent, but bounce backs were received from 1380, meaning 23,234 emails were sent successfully to potential participants. A total of 1924 participants accessed our survey, of whom 1630 participants provided a completed responses (response rate 7%; this frequency is slightly lower than the estimated 1,800 responses reported in our protocol). Most participants were Faculty Members/Primary Investigators (N=1151, 72%) and more than half of participants were male (N=943, 59%). Respondents were from more than 80 countries, with the USA (N=450, 28%) having the highest representation. About half of participants reported working in clinical research (N=819, 50%). Further demographic details by role, gender, country, and research area are provided in Table 1.
Perceptions of reproducibility
When asked whether there was a reproducibility crisis in biomedicine most researchers agreed (N=1168, 72%), with 27% (N=438) indicating the crisis was significant and 45% (N=703) indicating a slight crisis (we note that a ‘slight crisis’ is a bit of an oxymoron, but retained the wording from the original Nature survey for comparison purposes). Participants were then asked what percentage of papers in each of biomedical research overall, clinical biomedical research, in-vivo biomedical research, and in-vitro biomedical research they thought were reproducible. Only 5% (N=77) thought more than 80% of biomedical research was reproducible). See Table 2 for complete results. We provide a breakdown of responses between genders, between researchers in different biomedical research areas, and by career rank in Table S3.
Compared to the previously published Nature study (N=819, 52%), fewer participants in our study felt there was a ‘significant reproducibility crisis’ (N=438, 27%). This difference was even larger when we restricted our data to the Nature study participants who indicated they worked in medicine (N=203, 60%); for complete results see Table 3.
Determinants of irreproducibility
When presented with various potential causes of irreproducibility, more than half of participants responded that each presented factor contributes to irreproducibility. The top factor participants noted as ‘always contributing’ to irreproducibility was pressure to publish (N=300, 19%). Factors deemed least likely to contribute to irreproducibility were fraud (N=320, 20%) and bad luck (N=568, 36%). See Table 4 for complete results.
A total of 97 participants provided a written response to elaborate on what they perceived were causes of irreproducibility. Responses were coded into 16 unique codes and then thematically grouped into seven categories: ethics, research methods, statistical issues, incentives, issues with journal and peer review, lack of resources, and other. For definitions and illustrative examples of the codes, see Table 5.
Experiences with reproducibility
Participants were asked about their experience conducting reproducibility projects. Nearly a quarter of participants indicated that they had previously tried to replicate one of their own published studies and failed to do so (N=373, 23%); whereas 31% (N=501) indicated all such replications they had conducted had yielded the same result as the original; and slightly less than half of participants indicated that they had never tried to replicate any of their own published work (N=734, 46%). Among the 874 participants who indicated they had tried to replicate one of their own studies, when asked to consider their most recent replication effort, 313 (36%) indicated they had published the results. About a quarter of these participants (N=205, 25%), indicated they had no plans to publish their replication study.
Almost half of participants indicated that they had previously tried to replicate a published study conducted by another team and failed to do so (N=724, 47%); whereas 10% (N=156) indicated all replications they had conducted of the work of others had yielded the same result as the original, while 43% (N=666) indicated that they had never tried to replicate someone else’s published research. Among those who had published their replication study, when asked to consider their most recent replication effort, 29% (N=224) indicated it took about the same amount of time to publish as a typical non-replication paper. A quarter (N=189; 25%) of participants who had attempted to replicate others’ research indicated they had no plans to publish their replication study. Eighty-five percent of participants (N=1316) indicated they had never been contacted by another researcher who was unable to reproduce a finding they previously published. See Table 6 for complete results.
A total of 724 participants responded to the item about why they replicated their own research, of which 675 (93%) provided relevant text-based responses. Responses were coded into 17 unique codes and then grouped into seven themes: training, ensuring reproducibility, additional research, addressing concerns, joining/setting up a new lab, for publication or due to peer review, and other. For illustrative examples of the codes, see Table 7.
A total of 748 participants responded to the item about why they replicated someone else’s research, of whom 700 (94%) provided relevant text-based response data. Responses were coded into 19 unique codes and then grouped into seven themes: trustworthiness, extending and improving research, application to new setting, new research, interest, training, and other. For illustrative examples of the codes, see Table 8.
Support for initiatives to enhance reproducibility
Few participants reported that their research institution has established procedures to enhance reproducibility of biomedical research (N=254, 16%), and almost half reported that their institutions did not provide training on how to enhance the reproducibility of research (N=731, 48%) with an additional 503 (33%) reporting they were unsure of whether such training existed. We asked participants to provide information and links to relevant reproducibility training at their institution, which resulted in information or (functioning) links to 24 unique training resources (see S4). Among these 24 sources, just 9 (38%) clearly described specific openly available (e.g., no paywall) training related to reproducibility. Most researchers responded that they perceived their institution would value them doing new biomedical research studies more than replication studies (N=1031, 67%). The majority also indicated that it would be harder to find funding to conduct a replication study than a new study (N=1258, 83%), with just 7% (N=112) indicating they were aware of funders providing specific calls for conducting reproducibility related research.
We asked participants to indicate how much they agreed with the statement “I feel I am more concerned about reproducibility than the average researcher in my field and at my career stage” as a way to indirectly address potential bias in self-selection to complete the survey. Participants responded on a 5-point scale with endpoints, strongly disagree (1) and strongly agree (5). Participants reported a mean response of 3.2 (N=1402, SD=0.89) which corresponds to the mid-point of the scale ‘neither agree nor disagree’. For full results see Table 9.
Discussion
We report the results of an international survey examining perceptions of reproducibility. Almost three quarters of participants reported that they felt there was a reproducibility crisis in biomedicine. The concern appears to apply to biomedicine overall, but also specifically to clinical research, in-vivo research, and in-vitro research (11% or fewer participants indicated that they think more than 80% of papers in each category were reproducible). Researchers agreed that a variety of factors contribute to irreproducibility; however, the chief cause that most participants indicated ‘always contributes’ to irreproducible research was a pressure to publish. Concerns about how the current system of academic rewards stresses quantity over quality have been expressed for decades14,15 – a sentiment supported by this study’s data, which suggest that researchers’ performance is negatively impacted, in terms of producing reproducible research, by what the academic system incentivizes.
More than half of participants reported having tried to replicate their own work previously, with almost a quarter indicating that when they did so they failed, and many indicating that they do not intend to publish their findings. Similar findings were reported when asked about whether participants had tried to replicate another researcher’s study, with 57% indicating they had done so, and 47% indicating the replication failed. The majority of participants had not been contacted by another researcher who was unable to reproduce their findings, which suggests that teams of researchers attempting to reproduce studies don’t typically communicate despite the potential value for this contact to enhance reproducibility9,16.
The findings about institutions influence on reproducibility of research (Table 7) collectively suggest gaps in incentives and support to pursue reproducibility projects. While other stakeholders have taken actions to attempt to foster reproducibility, our results suggest that overall, researchers perceive that institutions are absent from efforts to support and reward research reproducibility17. The growth of ‘Reproducibility Networks’, national peer-led consortiums aiming to promote reproducibility and understand factors related to irreproducibility, are a promising opportunity to rectify this situation18. The structure of reproducibility networks allows for harmonization but also flexibility in approach. Obtaining a sustained funding mechanism will be critical to their growth and ongoing success.
This study is based on an extension of an earlier study of more than 1,500 researchers surveyed by Nature about reproducibility10. The current study differs in several important ways. Firstly, the focus is exclusively on biomedicine, since to our knowledge no large scale and representative survey of biomedical researchers has been conducted to date. Indeed, just 203 (13%) of the 1,576 researchers who completed the original study indicated ‘medicine’ as their main area of interest. Secondly, we randomly sampled researchers from publication lists, meaning we are able to report a response rate. This was not possible in the Nature survey, which was emailed to Nature readers and advertised on affiliated websites and social-media outlets, meaning that the number of individuals encountering the survey is unknown. While it is possible that among those invited to take part there is bias among participants who choose to complete the survey, our approach has been chosen to help minimize surveying those explicitly active in reproducibility projects or related initiatives. Indeed, the finding that overall participants report not to differ from their belief of their peers regarding their level of concern about reproducibility provides some assurance that our sampling strategy was effective. We also find there isn’t much difference between different groups responses (S3).
We find several key differences in our results and that of the original Nature paper. For example, in the original study, more than 70% of researchers indicated that they had tried and failed to reproduce another researcher’s study, while we find that slightly fewer than 50% of researchers in our study had had this experience. While 52% of participants in the original study reported they felt there was a ‘significant’ reproducibility crisis, approximately 27% of our participants indicated a significant crisis. These differences may reflect sampling bias, temporal changes in the research ecosystem over time, different perceptions in biomedicine compared to research more broadly, or a combination of these factors.
Concerns about reproducibility are being widely recognized within research but also more broadly in the media19. The COVID-19 pandemic has shifted our thinking about research transparency20 and highlighted issues with fraud in research, poor quality reporting, and poor-quality study design21,22, all factors that can contribute to irreproducible research. As stakeholders work to introduce training and interventions to enhance reproducibility it will be critical to monitor the effectiveness of these interventions. This international survey provides a contemporary cross-section of the biomedical community; repeating this survey in the future would allow for a temporal comparison on how perceptions and behaviors shift over time. The outcomes of this survey also highlight perceived causes and constraints to producing reproducible research, these can be prioritized within the biomedical community and targeted to create improvement.
Funding
This work did not receive funding.
Author contributions
Conceptualization, Methodology, Validation: KDC; Data curation, Investigation, Formal analysis: KDC, SE, FA; Project administration, Writing-First draft: KDC; Writing-Review and editing: All authors.
Competing interests
The authors have declared that no competing interests exist.
Data Availability Statement
Data and material are available on the OSF or in the supplementary materials.