Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Knowledge and attitudes among life scientists towards reproducibility within journal articles

Evanthia Kaimaklioti Samota, Robert P. Davey
doi: https://doi.org/10.1101/581033
Evanthia Kaimaklioti Samota
Earlham Institute, Norwich, UK;The University of East Anglia, Norwich, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Robert P. Davey
Earlham Institute, Norwich, UK;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

We constructed a survey to understand how authors and scientists view the issues around reproducibility, and how solutions such as interactive figures could enable the reproducibility of experiments from within a research article. This manuscript reports the results of this survey on the views of 251 researchers, including authors who have published in eLIFE Sciences, and those who work at the Norwich Biosciences Institutes (NBI). The survey also outlines to what extent researchers are occupied with reproducing experiments themselves and what are their desirable features of an interactive figure. Respondents considered various features for an interactive figure within a research article that would allow for them to better understand and reproduce in situ the experiment presented in the figure. Respondents said that the most important element that would enable the better reproducibility of published research would be that authors describe methods and analyses in detail. The respondents believe that having an interactive figures in published papers is a beneficial element. Whilst interactive figures are potential solutions for demonstrating technical reproducibility, we find that there are equally pressing cultural demands on researchers that need to be addressed to achieve greater success in reproducibility in the life sciences.

1. Background

Reproducibility is a defining principle of scientific research, and refers to the ability of researchers to replicate the findings of a study using same or similar methods and materials as did the original researchers (Goodman, Fanelli and Ioannidis, 2016). However, irreproducible experiments are common across all disciplines of life sciences (Grant, 2012). A recent study showed that 88% of drug-discovery experiments could not be reproduced or replicated even by the original authors, in some cases forcing retraction of the original work (Baker, 2012). Irreproducible genetic experiments with weak or wrong evidence can have negative implications on our healthcare (Yong, 2015). For example 27% of mutations linked to childhood genetic diseases cited in literature have later been discovered to be common polymorphisms or misannotations (Bell et al., 2013). While irreproducibility is not confined to biology and medical sciences (Ioannidis and Doucouliagos., 2013), irreproducible biomedical experiments pose a strong financial burden on society; an estimated $28 billion was spent on irreproducible biomedical science in 2015 in the USA alone (Freedman et al., 2015).

Computational reproducibility is an important aspect of reproducibility, relating to the way in which conclusions rely on specific analyses or other procedures undertaken on computational systems. There are two main definitions of computational reproducibility in the literature:

  1. The original authors or others using the same data, running precisely the same workflow and getting the same results (Gentleman, 2005). Others define this concept as recomputability (Gent, 2013).

  2. Running similar data with the same workflow, and getting similar results i.e. the workflow is reproducible (Gent, 2013).

Computational reproducibility has both technical and cultural aspects. Technical challenges to reproducibility include poorly written, incorrect, or unmaintained software, changes in software libraries1 on which tools are dependent, or incompatibility between older software and newer operating systems (Cataldo et al., 2009). Cultural challenges include insufficient descriptions of methods, reluctance to publish original data and code under FAIR (Findable, Accessible, Interoperable, and Reusable) principles, and other social factors such as the favouring of high prestige or high impact science publications over performing rigorous and reproducible science.

Several projects have attempted to address some of the technical aspects of reproducibility by making it easier for authors to disseminate fully reproducible workflows and data, and for readers to perform computations. For example: F1000 Living Figure (Colomb and Brembs, 2015); Whole Tale Project (Brinckman et al., 2018); RetroZIP project (https://www.reprozip.org/); Python compatible tools and widgets (IPython notebook interactive widgets, Jupyter Notebooks); FigShare (http://www.figshare.com) as an example of a scientific data repository; Galaxy (Afgan et al., 2018); CyVerse (formerly iPlant Collaborative, Goff, 2011); myExperiment (Goble et al., 2010); UTOPIA (Pettifer et al., 2009, 2004); GigaScience Database (Sneddon, Li and Edmunds, 2012); Taverna (Wolstencroft et al., 2013; Hull et al., 2006; Oinn et al., 2004); workflow description efforts such as the Common Workflow Language (Amstutz et. al., 2016); and Docker (http://www.docker.com), Singularity (https://singularity.lbl.gov/, Kurtzer, Sochat and Bauer, 2017), and other container systems.

Even though these tools are widely available, and seem to address many of the issues of technical and cultural reproducibility, they have not yet become a core part of the life sciences experimental and publication lifecycle. There is an apparent disconnection between the development of tools addressing reproducibility and their use by the wider scientific and publishing communities who might benefit from them. However, there have been notable efforts to make this connection. The Living Figure by Björn Brembs and Julien Colomb was the first prototype of a dynamic figure that allowed readers to change parameters of a statistical computation underlying a figure (Colomb and Brembs, 2015). The first eLIFE computationally reproducible article, formed by converting manuscripts created in a specific format (using the Stencila Desktop, https://stenci.la/, and saved as a Document Archive file) into interactive documents, offers more interactivity at the publication level, allowing the reader to “play” with the article and its figures when viewed in a web browser (eLIFE Sciences, 2019).

While there are few incentives to promote cultural reproducibility (Higginson and Munafò, 2016), efforts in most science domains are being made to establish a culture where an expectation to share data for all publications according to the FAIR principles is prioritised. It is widely accepted that better reproducibility will benefit the scientific community and the general public (‘NIH, 2015’, n.d.; Wilkinson et al., 2016). Although studies have suggested that reproducibility in science is a serious issue (Pulverer, 2015; Stodden, Guo and Ma, 2013), with costly repercussions, fewer studies have investigated the attitudes and knowledge of researchers around reproducibility and what would be the most desirable solutions and infrastructures to enable reproducibility. In particular, minimal research has been conducted into the frequency of difficulties experienced with reproducibility, the perception of its importance, and preferences with respect to potential solutions among the general life sciences community. This paper presents a survey that was, in part, designed to inform the design of the reproducible document by canvassing respondents’ preferred features for interactive figures. We aimed to address this critical gap in reproducibility knowledge, in order to inform the development of tools that better meet the needs of producers and consumers of life science research. We constructed the survey in order to understand how the following are experienced by the respondents:

  • Computational reproducibility: issues with accessing data, code and methodology parameters, and how solutions such as interactive figures could promote reproducibility from within an article.

  • Cultural reproducibility: attitudes towards reproducibility, the social factors hindering reproducibility, and interest in interactive figures and their feature preferences.

3. Methods

Population and sample

Our sample populations were selected to include all life sciences communities across levels of seniority, discipline and level of experience with the issues we wished to survey. The first survey was conducted in November 2016 and sent out to 750 researchers working in the Norwich Biosciences Institutes (NBI) at post-doctoral level or above. The NBI is a partnership of four UK research institutions: the Earlham Institute (formerly known as The Genome Analysis Centre), the John Innes Centre, the Sainsbury Centre, and the Institute of Food Research (now Quadram Institute Bioscience). Invitations to participate were distributed via email, with a link to the survey. The second survey, similar to the first but with amendments and additions, was distributed in February 2017 to a random sample of 1662 active researchers who had published papers in the eLIFE journal. Invitations to participate were sent using email by eLIFE staff. We achieved an 15% (n=112) response rate from the NBI researchers, and an 8% response rate from the eLIFE survey (n=139). Table 1 shows the survey questions. Questions were designed to give qualitative and quantitative answers on technical and cultural aspects of reproducibility. Questions assessed the frequency in difficulties encountered in accessing data, the reasons for these difficulties, and how respondents currently obtain data underlying published articles. They measured understanding of what constitutes reproducibility of experiments, interactive figures, and reproducible computational data. Finally, we evaluated the perceived benefit of interactive figures and of reproducing computational experiments, and which features of interactive figures would be most desirable.

View this table:
  • View inline
  • View popup
Table 1:

Questions used to survey the knowledge of respondents about research reproducibility. Questions indicated with an asterisk were only available to the eLIFE survey. Answer options to the questions are shown in Supplementary section 1.

Statistical analysis

Results are typically presented as proportions of those responding, stratified by the respondent’s area of work, training received, and version of the survey as appropriate. Chi-square tests for independence were used to test for relationships between responses to specific questions, or whether responses varied between samples. Analysis was conducted using Microsoft Excel and R (version 3.5.0; R Core Team, 2018), and all supplementary figures and data are available on Figshare (see Data Availability).

We assessed if there was a significant difference in the ability and willingness to reproduce published results between the cohort of eLIFE respondents who understand the term “computationally reproducible data” and those who do not. We did not include those that replied “Unsure” with regards to their understanding of the term “computationally reproducible data”. The respondents who chose “yes tried reproducing results, but unsuccessfully”, “have not tried to reproduce results” and “it is not important to reproduce results” were group together under “unsuccessfully”.

Results

Characteristics of the sample

Figure 1 shows the distribution of areas of work of our respondents, stratified by survey sample. Genomics (proportion in whole sample = 22%), biochemistry (17%), and computational biology (15%) were the most common subject areas endorsed in both NBI and eLIFE samples. With regard to how often respondents use bioinformatics tools, 25% replied “never”, 39% “rarely”, and 36% “often”. Many (43%) received statistical training, (31%) bioinformatic training, (20%) computer science training.

Figure 1:
  • Download figure
  • Open in new tab
Figure 1:

The type of data the NBI and eLIFE respondents work with in percentage of prevalence. Responses were not mutually exclusive. The choices of data types were the same as the article data types available in the eLIFE article categorisation system.

Access to data and bioinformatics tools

In both samples, 90% of those who responded reported having tried to access data underlying a published research article (Figure 2). Of those who had tried, few had found this “easy” (14%) or “very easy” (2%) with 41% reporting that the process was “difficult” and 5% “very difficult”. Reasons for difficulty were chiefly cultural (Figure 2), in that the data was not made available alongside the publication (found by 63% of those who had tried to access data), or authors could not be contacted or did not respond to data requests (44%). Relatively few found data unavailable for technical reasons of data size (17%), confidentiality (10%) or commercial sensitivity (11%). With respect to data sources, 57% of the total sample have used open public databases, 48% reported data was available with a link in the paper, and 48% had needed to contact authors.

Figure 2.
  • Download figure
  • Open in new tab
Figure 2.

Left panel: whether respondents have attempted to access data underlying previous publications and the level of difficulty typically encountered in doing so. Right panel: the reasons given by respondents for being unable to access data (restricted to those who have attempted to access data). See supplementary material for full wording of questions and responses.

Very few of the respondents either never (2%) or rarely (8%) had problems with running, installing, configuring bioinformatics software. Problems with software were encountered often (29%) or very often (15%) suggesting that nearly half of respondents regularly encountered technical barriers to computational reproducibility.

Understanding of reproducibility, training and successful replication

The majority of respondents reported that they understood the term “reproducibility of experiments” and science. In contrast, most (52%) participants did not know what the term “computationally reproducible data” means, while 26% did know and 22% were unsure. We received several explanations (free text responses) of the term “computationally reproducible data”, some of which were more accurate than others (Supplementary section, free responses to question 13).

Some (18%) reported not attempting to reproduce or revalidate published research. Very few (N=5; 6%) of the sample endorsed the option that “it is not important to reproduce other people’s published results” (Supplementary figure 1). Even though the majority (60%) reported successfully reproducing published results, almost a quarter of the respondents found that their efforts to reproduce any results were unsuccessful (23%).

Supplementary Table 1 shows the willingness and ability of respondents in reproducing experiments stratified by the training they received and understanding of the term “computationally reproducible data”. We found significant difference between the ability to reproduce published experiments and knowing the meaning of the term “computationally reproducible data”. Among the 25 respondents who understood the term “computationally reproducible data”, 18 (72%) had successfully reproduced previous work, compared to only 26 (52%) of the 50 who responded that they did not understand the term (chi-square test for independence, p=0.048).

There was no evidence for a difference in the ability and willingness to reproduce published results between the respondents who use bioinformatics tools often, and those who use them rarely or never (data not shown). The majority of the respondents who use bioinformatics tools often were coming from the scientific backgrounds of Biophysics, Biochemistry, Computational Biology and Genomics. Most of the respondents who answered “reproducibility is not important” and “haven’t tried reproducing experiments” were scientists coming from disciplines using computational or bioinformatics tools “rarely” or “never” (Supplementary Table 2).

Improving Reproducibility of Published Research

The vast majority (91%) of respondents stated that authors describing all methodology steps in detail, including any formulae analysing the data, would be the most effective way to make published science more reproducible. Only around half endorsed the view that “authors should provide the source code of any custom software used to analyse the data and that the software code is well documented” (53%), and that authors provide a link to the raw data (49%) (Supplementary figure 2). Two respondents suggested that achieving better science reproducibility would be easier if funding was more readily available for reproducing the results of others and if there were opportunities to publish the reproduced results (Supplementary section, free responses). Within the same context, some respondents recognised the current culture in science that there are not sufficient incentives in publishing reproducible (or indeed negative findings) papers, but rather being rewarded in publishing as many papers as possible in high Impact Factor journals (Supplementary section, free responses).

Interactive Figures

Participants ordered in terms of preference features for an interactive figure within an article, which included choices such as “easy to manipulate” as the most preferred, and have easy to define parameters (Figure 3). Generally, the answers from both the eLIFE and NBI surveys followed similar trends.

Figure 3.
  • Download figure
  • Open in new tab
Figure 3.

Responses to question 9: Respondents were asked to rank in order of preference the above features, with 1 most preferred feature, to 11 the least preferred feature. The average score for each feature was calculated in order of preference as selected by the respondents from both NBI and eLIFE surveys. The lower the average score value (x-axis), the more preferred the feature (y-axis).

Furthermore, free text responses were collected, and most respondents stated that having further insights into the data presented in the figure would be beneficial (Supplementary section, free responses). The majority of the respondents perceive a benefit in having interactive figures in published papers for both readers and authors (Figure 4). Examples of insights included: the interactive figure would allow visualising further points on the plot from data in the supplementary section as well as be able to alter the data that is presented in the figure; having an interactive figure as a movie, or to display protein 3D structures, would be beneficial to readers. The remaining responses we categorised as software related, which included suggestions of software that could be used to produce a figure that can be interactive, such as R Shiny (Chang 2015; Chang et al. 2016). A moderate proportion of eLIFE respondents (19%) and NBI (27%) stated that they had no opinion on the utility of interactive figures. Free text answers for this group suggested that they had never seen or interacted with such a figure before, and no indication was given that an interactive figure would help their work.

Figure 4
  • Download figure
  • Open in new tab
Figure 4

Responses to question 11: The level of perception of benefit (%) to having the ability to publish papers with interactive figures. The benefit to the author, to the readers of the author’s papers and to the papers the author reads. Answers include the responses from both NBI and eLIFE surveys.

The majority of the respondents also said that they see benefit in automatically reproducing computational experiments, and manipulating and interacting with parameters in computational analysis workflows; equally favourable was to be able to computationally reproduce statistical analyses (Figure 5). Despite this perceived benefit, most respondents (61%) indicated that the ability to include an interactive figure would not affect their choice of journal when seeking to publish their research.

Figure 5
  • Download figure
  • Open in new tab
Figure 5

Responses to question 14 (both eLIFE and NBI): Assessing the perceived benefit (%) to the respondents to be able to automatically reproduce computational experiments or other analyses (including statistical tests) described in the paper.

5. Discussion

This study highlights the difficulties currently experienced in reproducing experiments, and expressed positive attitudes of scientists involved in the current publishing system towards enabling and promoting reproducibility of published experiments through interactive elements in online publications. All respondents of the survey were active life sciences researchers and therefore we believe the opinions collected are representative of researchers in life sciences who are routinely reading and publishing research. While progress has been made in publishing standards across all life science disciplines, the opinions of the respondents reflect previously published shortcomings of the publishing procedures (Müller et al., 2003; Marx, 2013; Stodden, 2015): lack of data and code provision; storage standards; not including or requiring detailed description of methods and code structure in the published papers. When data is difficult to obtain the reproducibility problem is exacerbated. However the level of interest and incentives in reproducing published research is at its infancy, or it is not the researchers’ priority, something also mentioned extensively in previous literature (Baker, 2016; Stodden, 2015; Open Science Collaboration, 2015; Collins and Tabak, 2014). Responses to our surveys suggested that most life scientists understand that science becomes implicitly more reproducible if methods (including data, analysis, and code) are well-described and available, and perceive a potential benefit of tools that enable this. However, respondents stated they could see the benefit in having interactive figures for their readers and being able as authors to present their data as interactive figures, but that the availability of this facility would not affect their decisions on where to publish. Therefore, given that technologies exist to aid reproducibility and authors know they are beneficial, many scientific publications do not meet basic standards of reproducibility. Respondents endorsed articles which include interactive elements, where access to the raw data, code, and detailed analysis steps in the form of an interactive figure would help article readers better understand the paper and the experimental design and methodology, and improve the reproducibility of the experiment presented in the interactive figure, especially computational experiments. This contradiction suggests that cultural factors play an underestimated role in reproducibility.

Retraction rates (Cokol et al., 2008) would suggest that the current publishing system is yet to provide a mechanism to reliably check whether a published study is reproducible. There remains a perception that researchers do not get credit for reproducing the work of others or publishing negative results. Whilst some journals do explicitly state that they welcome negative results articles (e.g. PLOS One “Missing Pieces” collection), this is by no means the norm in life science publishing as evidenced by low, and dropping, publication rates of negative findings (Franco et al., 2014, Fanelli, 2011). Ideally the publication system would enable checking of reproducibility at the peer-review stage, by authors, reviewers and editors providing all data (including raw data), a full description of methods including statistical analysis parameters, any negative findings based on previous work, open source software code, etc. (Iqbal et al., 2016). Peer reviewers would then be better able to check for anomalies, and editors could perform the final check to ensure that the science paper to be published is presenting true, valid, and reproducible research. Some respondents have suggested that if reviewers and/or editors were monetarily compensated, spending time to reproduce or validate the computational experiments in manuscripts would become more feasible, and would aid the irreproducibility issue. However, paying reviewers does not necessarily ensure that they would be more diligent in checking or trying to reproduce results (Hershey, 1992) and there must be optimal ways to ensure effective pressure is placed upon the authors and publishing journals to have better publication standards (Announcement: Reducing our irreproducibility, 2013; Pusztai, Hatzis and Andre, 2013). The increasing adoption by biomedical journals of reporting standards for experimental design, methods and results, provide a framework for to harmonise the description of scientific processes to enable reproducibility, although these are not universally enforced (Moher, 2018). Similarly, concrete funding within research grants for implementing reproducibility itself, manifested as actionable Data Management Plans (http://www.dcc.ac.uk, 2019) rather than what is currently a by-product of the publishing process, could give a level of confidence to researchers who would want to reproduce previous work by incorporating that data in their own projects.

Our findings are in accordance with the current literature (Berg, 2018; Pulverer, 2015) that highlight that the lack of data access at the publication stage is one of the major reasons leading to the irreproducibility of published studies. Even with current policies mandating data openness (NIH, 2015; Wilkinson et al., 2016), authors still fail to include their data alongside their publication. This is supported by our findings that the majority of respondents replied that data is either not available upon publication (57%) or authors cannot be reached/are unresponsive to data provision requests (44%), which continues to be a cultural artifact of using a paper’s methods section as a description of steps to reproduce analysis, rather than a fully reproducible solution involving public data repositories, open source code, and comprehensive documentation. Pre-print servers such as bioRxiv have been taken up rapidly (Abdill, 2018), especially in the genomics and bioinformatics domains, and this has the potential to remove delays in publication whilst simultaneously providing a “line in the sand” with a Digital Object Identifier (DOI) and maintaining the requirements for FAIR data. In some cases sensitivity of data might discourage authors from data sharing, (Figueiredo, 2017; Hollis, 2016), but this reason was only reported by a small proportion of our respondents. Whilst efforts such as OpenTrials (Goldacre, 2016) are attempting to apply the FAIR principles to clinical trial data, the service is by no means ubiquitous.

Reproducibility of experiments could be improved with better storage solutions for large data files and citing them within the publication document, especially those in the order of terabytes, for their proper reusability (Philip Chen and Zhang, 2014; Poldrack and Gorgolewski, 2014; Faniel and Zimmerman, 2011). Currently, there are several services that allow storing large data files and perform cloud analyses, such as CyVerse, Amazon Web Services (Amazon Web Services, Inc., 2019; Fusaro et al., 2011; Hazelhurst, 2008) and Google Genomics (https://cloud.google.com/genomics/). Despite the potential advantage these services can provide for data accessibility, they do not implicitly solve the problem of data reusability, when data is too large to be stored locally or transferred via slow internet connections, or there is no route to attach metadata that describes the datasets sufficiently for reuse or integration with other datasets. There is also the question of data repository longevity -who funds the repositories for decades into the future? Data within public repositories with specific deposition requirements (such as the EMBL-EBI European Nucleotide Archive), might not be associated or annotated with standardised metadata that describes it accurately (Attwood et al., 2009), rather the bare minimum for deposition. In addition, corresponding authors often move on from projects and institutions or the authors themselves can no longer access the data, meaning “data available on request” ceases to be a viable option to source data or explanations of methods.

In a 2016 survey of 3987 National Science Foundation Directorate of Biological Sciences principal investigators (BIO PIs), expressed their greatest unmet training needs by their institutions. These were in the areas of integration of multiple data (89%), data management and metadata (78%) and scaling analysis to cloud/high performance computing (71%). The aforementioned data and computing elements are integral to the correct knowledge “how to” for research reproducibility. Our findings indicated that those who stated they had experience in informatics also stated they are better able to attempt and reproduce results. Practical bioinformatics and data management training, rather than in specific tools, may be an effective way of reinforcing the notion that researchers’ contributions towards reproducibility are a responsibility that requires active planning and execution. This may be especially effective when considering the training requirements of wet-lab and field scientists, who are becoming increasingly responsible for larger and more complex computational datasets. Further research needs to be undertaken to better understand how researchers’ competence in computational reproducibility may be linked to their level of informatics training.

Respondents mentioned that there are word count restrictions in papers, and journals often ask authors to shorten methods sections and perhaps move text to supplementary information placed many times in an unorganised fashion or having to remove it altogether. This is a legacy product of the hard-copy publishing era and, readability aside, word limits are not consequential for internet journals. Even so, if the word count limit was only applicable to the introduction, results and discussion sections, then the authors could describe methods in more detail within the paper, without having to move that valuable information in the supplementary section. When methods are citing methodology techniques as described in other papers, where those original references are hard to obtain, typically through closed access practices or by request mechanisms as noted above, then this can be an additional barrier to the reproducibility of the experiment. This suggests that there are benefits to describing the methods in detail and stating that they are similar to certain (cited) references as well as document the laboratory's expertise in a particular method. However, multi-institutional or consortium papers are becoming more common with ever-increasing numbers of authors on papers, which adds complexity to how authors should describe every previous method available that underpins their research (Gonsalves, 2014). There is no obvious solution to this issue. Highly specialised methods (e.g. electrophysiology expertise, requirements for large computational resources or knowledge of complex bioinformatics algorithms) and specific reagents (e.g. cell lines, antibodies) might not be readily available to other research groups. As stated by some respondents, in certain cases the effective reproducibility of experiments is obstructed by numerical issues with very small or very large matrices or datasets, or differing versions of analysis software used, perhaps to address bugs in analytical code, will cause a variation in the reproduced results.

Previous studies have provided strong evidence that there is a need for better technical systems and platforms to enable and promote the reproducibility of experiments. We provide additional evidence that that paper authors and readers perceive a benefit from having an interactive figure that would allow for the reproducibility of the experiment shown in the figure. The figure would give access to the raw data, code and detailed data analysis steps, allow for in situ reproducing computational experiments by re-running code including statistical analyses “live” within the paper. The findings of this survey have helped eLIFE to understand what is desirable for an interactive figure, elements of which have informed their first computationally reproducible document (eLIFE Sciences, 2019). Despite the benefits that interactive documents and figures can provide to the publishing system, and that those benefits that are in demand by the scientific community, work is needed in order to promote and support their use. Given the diversity of biological datasets and ever-evolving methods for data generation and analysis, it is unlikely that a single interactive figure infrastructure type can support all types of data. More research into how different types of data can be supported and presented in papers with interactivity needs to be undertaken, yet problems with data availability and data sizes will persist - many studies comprise datasets that are too large to upload and render within web browsers in a reasonable timescale. Even if the data are available through well-funded repositories with fast data transfers, e.g. the INSDC databases, are publishers ready to bear the extra costs of supporting the infrastructure and people required to develop or maintain such interactive systems in the long run? These are questions that need to be further investigated, particularly when considering any form of industry standardisation of such interactivity in the publishing system.

We show that providing tools to scientists who are not computationally aware also requires a change in culture, as many aspects of computational reproducibility require a change in publishing behaviour and competence in the informatics domain. This study provides some evidence that those scientists who were aware of both what computationally reproducible data is and were able to successfully reproduce experiments, were those who had more training and experience in bioinformatics and computer science. Encouraging and incentivising scientists to conduct transparent, reproducible and replicable research should be prioritised to help solve the irreproducibility issue, and implementing hiring practices with open science at the core of research roles (Schönbrodt, 2019) will encourage attitudes to change across faculty departments and institutions.

Another potential solution to the reproducibility crisis is to identify better (quantifiable) metrics of research reproducibility and its scientific impact. The current assessment of the impact of research articles are a set of quantifiable metrics that do not evaluate research reproducibility, but stakeholders are starting to request that checklists and tools are provided to improve these assessments (Wellcome Trust, 2018). It is harder to find a better approach that is based on a thoroughly informed analysis by unbiased experts in the field that would quantify the reproducibility level of the research article (Flier, 2017). That said, top-down requirements from journals and funders to release reproducible data and code may go some way to improving computational reproducibility within the life sciences, but this will also rely on the availability of technical solutions that are accessible and useful to the majority of scientists.

Opinions are mixed regarding the extent and severity of the reproducibility crisis (Flier, 2017). From our findings, and given the ongoing release of tools and platforms for technical reproducibility, future efforts should be spent in tackling the cultural behaviour of scientists, especially when faced with the need to publish for career progression.

Data Availability

All data files are available via this url: https://doi.org/10.6084/m9.figshare.c.4436912.v5

Acknowledgements

This project is funded by a BBSRC iCASE Studentship (project reference: BB/M017176/1). We would like to thank all the respondents of the surveys for their time. We would also like to thank George Savva from the Quadram Institute (QIB, UK) for comments and suggestions for this manuscript; Paul Shannon, Nathan Lisgo, and Jennifer McLennan from eLIFE Sciences Publications Ltd, with whom the corresponding author collaborates as an iCASE student; as well as Ian Mulvany, former eLIFE Head of Development, for his help in developing the survey questionnaire.

Footnotes

  • ↵1 “A software library is a collection of data and programming code utilised to develop software programs and applications. It is designed to help both the programmer and the programming language compiler in building and executing software”. (Technopaedia: https://www.techopedia.com/definition/3828/software-library).

6. References

  1. 1.↵
    Abdill, R.J., Blekhman, R. (2019). Tracking the popularity and outcomes of all bioRxiv preprints. bioRxiv doi: 10.1101/515643
    OpenUrlAbstract/FREE Full Text
  2. 2.↵
    Afgan, E., Baker, D., Batut, B., van den Beek, M., Bouvier, D., Čech, M., Chilton, J., Clements, D., Coraor, N., Grüning, B., Guerler, A., Hillman-Jackson, J., Hiltemann, S., Jalili, V., Rasche, H., Soranzo, N., Goecks, J., Taylor, J., Nekrutenko, A. and Blankenberg, D. (2018). The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Research, 46(W1), pp.W537–W544.
    OpenUrlCrossRefPubMed
  3. 3.↵
    Amazon Web Services, Inc. (2019). Amazon Web Services (AWS) - Cloud Computing Services. [online] Available at: https://aws.amazon.com/ (Accessed: 23 Jan 2019).
  4. 4.↵
    Amstutz, P., Crusoe, MR., Tijanić N., Chapman B., Chilton J., Heuer M., Kartashov A., Leehr D., Ménager H., Nedeljkovich M., Scales M., Soiland-Reyes S., Stojanovic L. (2016). Common Workflow Language, v1.0. Specification, Common Workflow Language working group. https://w3id.org/cwl/v1.0/ doi:10.6084/m9.figshare.3115156.v2
    OpenUrlCrossRef
  5. 5.↵
    Attwood, T.K., Kell, D.B., McDermott, P., Marsh, J., Pettifer, S.R., Thorne, D. (2009). Calling International Rescue: knowledge lost in literature and data landslide! Biochem. J, 424, pp.317–333.
    OpenUrlAbstract/FREE Full Text
  6. 6.
    Baker, M. (2017). Is the scientific literature self-correcting? : News blog. [online] Blogs.nature.com. Available at: http://blogs.nature.com/news/2012/12/is-the-scientific-literature-self-correcting.html (Accessed 12 Feb 2015).
  7. 7.↵
    Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature, 533(7604), pp.452–454.
    OpenUrlCrossRefPubMed
  8. 8.↵
    Bell, C.J., Dinwiddie D.L., Miller, N.A., Hateley, S.L., Ganusova, E.E., Mudge, J., Langley, R.J., Zhang, L., Lee, C.C., Schilkey, F.D., Sheth, V., Woodward, J.E., Peckham, H.E., Schroth, J.P., Kim, R.W., Kingsmore, S.F. (2013). Carrier Testing for Severe Childhood Recessive Diseases by Next-Generation Sequencing. Science Translational Medicine, 3(65), pp.1–26.
    OpenUrl
  9. 9.↵
    Berg, J. (2018). Progress on reproducibility. Science, 359(6371), pp.9–9.
    OpenUrlAbstract/FREE Full Text
  10. 10.↵
    Brinckman, A., Chard, K., Gaffney, N., Hategan, M., Jones, M., Kowalik, K., Kulasekaran, S., Ludäscher, B., Mecum, B., Nabrzyski, J., Stodden, V., Taylor, I., Turk, M. and Turner, K. (2018). Computing environments for reproducibility: Capturing the “Whole Tale”. Future Generation Computer Systems.
  11. 11.
    Casadevall, A., Ellis, L.M., Davies, E.W., McFall-Ngai, M., Fang, F.C. (2016). A Framework for Improving the Quality of Research in the Biological Sciences. MBio, 7.
  12. 12.↵
    Cataldo, M., Mockus, A., Roberts, J.A., Herbsleb, J.D. (2009). Software Dependencies, Work Dependencies, and Their Impact on Failures. IEEE Trans. Software Eng., 35, pp.864–878.
    OpenUrl
  13. 13.↵
    Cokol, M., Ozbay, F., Rodriguez-Esteban, R. (2008). Retraction rates are on the rise. EMBO Rep., 9, 2.
    OpenUrlFREE Full Text
  14. 14.↵
    Collins, F. and Tabak, L. (2014). Policy: NIH plans to enhance reproducibility. Nature, 505(7485), pp.612–613.
    OpenUrlCrossRefPubMedWeb of Science
  15. 15.↵
    Colomb, J. and Brembs, B. (2015). Sub-strains of Drosophila Canton-S differ markedly in their locomotor behavior. F1000Research, 3, p.176.
    OpenUrl
  16. 16.↵
    Chang, W. (2015) shinydashboard: Create Dashboards with ‘Shiny’. R package version 0.5.1. https://CRAN.R-project.org/package=shinydashboard
  17. 17.↵
    Chang, W., Cheng, J., Allaire, J.J., Xie, Y., McPherson, J. (2016) shiny: Web Application Framework for R. R package version 0.13.1. https://CRAN.R-project.org/package=shiny
  18. 18.
    Dcc.ac.uk. (2019). Data Management Plans | Digital Curation Centre. [online] Available at: http://www.dcc.ac.uk/resources/data-management-plans (Accessed 9 Mar 2019).
  19. 19.
    Editorial (2013). Announcement: Reducing our irreproducibility. Nature, 496, p.398.
    OpenUrl
  20. 20.↵
    eLIFE Sciences (2019) “Introducing eLIFE’s first computationally reproducible document: Blending the traditional manuscript with live code, data and interactive figures, we showcase a new way for researchers to tell their full story”, Labs Blog, 20 Feb. Available at: https://elifesciences.org/labs/ad58f08d/introducing-elife-s-first-computationally-reproducible-article (Accessed: 20 Feb 2019).
  21. 21.↵
    Fanelli, D. (2011). Negative results are disappearing from most disciplines and countries. Scientometrics, 90(3), pp.891–904.
    OpenUrlCrossRefWeb of Science
  22. 22.↵
    Faniel, I. and Zimmerman, A. (2011). Beyond the Data Deluge: A Research Agenda for Large-Scale Data Sharing and Reuse. International Journal of Digital Curation, 6(1), pp.58–69.
    OpenUrl
  23. 23.↵
    Figueiredo, A. (2017). Data Sharing: Convert Challenges into Opportunities. Frontiers in Public Health, 5.
  24. 24.
    Fink, J. (2014). Docker: a Software as a Service, Operating System-Level Virtualisation Framework. Code{4}lib Journal, 25.
  25. 25.↵
    Flier, J.S. (2017). Irreproducibility of published bioscience research: Diagnosis, pathogenesis and therapy. Mol Metab, 6, pp.2–9.
    OpenUrl
  26. 26.↵
    Franco, A., Malhotra, N., Simonovits, G. (2014). Publication bias in the social sciences: Unlocking the file drawer. Science, 345(6203), pp.1502–1505.
    OpenUrlAbstract/FREE Full Text
  27. 27.↵
    Freedman, L.P., Cockburn, I.M., Simcoe, T.S. (2015). The Economics of Reproducibility in Preclinical Research. PLOS Biol., 13, e1002165.
    OpenUrlCrossRefPubMed
  28. 28.↵
    Gentleman, R. (2005). Reproducible research: a bioinformatics case study. Stat. Appl. Genet. Mol. Biol., 4, 2.
    OpenUrl
  29. 29.↵
    Gent, I.P. (2013). The recomputation manifesto. arXiv, arXiv:1304.3674.
  30. 30.↵
    Goodman, S., Fanelli, D., Ioannidis, J. (2016). What does research reproducibility mean?. Science Translational Medicine, 8(341), p.341.
    OpenUrl
  31. 31.↵
    Sneddon, T., Li, P., Edmunds, S. (2012). GigaDB: announcing the GigaScience database. GigaScience, 1(1).
  32. 32.↵
    Goble, C., Bhagat, J., Aleksejevs, S., Cruickshank, D., Michaelides, D., Newman, D., Borkum, M., Bechhofer, S., Roos, M., Li, P., De Roure, D. (2010). myExperiment: a repository and social network for the sharing of bioinformatics workflows. Nucleic Acids Research, 38(suppl_2), pp.W677–W682.
    OpenUrlCrossRefPubMedWeb of Science
  33. 33.↵
    Goff SA, Vaughn M, McKay S, et al. : The iPlant Collaborative: Cyberinfrastructure for Plant Biology. Front Plant Sci. (2011). 2(34). doi:10.3389/fpls.2011.00034.
    OpenUrlCrossRefPubMed
  34. 34.↵
    Goldacre, B. and Gray, J. (2016). OpenTrials: towards a collaborative open database of all available information on all clinical trials. Trials, 17(1).
  35. 35.↵
    Gonsalves, A. (2014). Lessons learned on consortium-based research in climate change and development. CARIAA Working Paper (1). Ottowa: International Development Research Centre and London: UK Aid. Available at: https://www.idrc.ca/en/article/lessons-consortium-based-researchclimate-change-and-development (Accessed: 1 January 2019).
  36. 36.↵
    Grant, B. (2012). Science’s Reproducibility Problem. The Scientist. Available at: http://www.the-scientist.com/?articles.view/articleNo/33719/title/Science-s-Reproducibility-Problem/ (Accessed 12 Feb 2015).
  37. 37.↵
    Hazelhurst, S. (2008). Scientific computing using virtual high-performance computing. Proceedings of the 2008 annual research conference of the South African Institute of Computer Scientists and Information Technologists on IT research in developing countries riding the wave of technology - SAICSIT ’08, pp.94–103.
  38. 38.↵
    Hershey, N. (1992). Compensation and Accountability: The Way to Improve Peer Review. Quality Assurance and Utilization Review, 7(1), pp.23–29.
    OpenUrl
  39. 39.↵
    Higginson, A.D., Munafò, M.R. (2016). Current Incentives for Scientists Lead to Underpowered Studies with Erroneous Conclusions. PLoS Biol., 14, e2000995.
    OpenUrlCrossRef
  40. 40.↵
    Hollis KF. (2016). To Share or Not to Share: Ethical Acquisition and Use of Medical Data. AMIA Jt Summits Transl Sci Proc., pp.420–427.
  41. 41.↵
    Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M.R., Li, P., Oinn, T. (2006). Taverna: A tool for building and running workflows of services. Nucleic Acids Research, 34, pp.729–732.
    OpenUrlCrossRef
  42. 42.↵
    Ioannidis, J. and Doucouliagos, C. (2013), What’s to know about the credibility of empirical economics? Journal of Economic Surveys, 27: 997–1004. doi:10.1111/joes.12032.
    OpenUrlCrossRef
  43. 43.↵
    Iqbal, S., Wallach, J., Khoury, M., Schully, S. and Ioannidis, J. (2016). Reproducible Research Practices and Transparency across the Biomedical Literature. PLOS Biology, 14(1), p.e1002333.]
    OpenUrlCrossRefPubMed
  44. 44.↵
    Kurtzer, G., Sochat, V. and Bauer, M. (2017). Singularity: Scientific containers for mobility of compute. PLOS ONE, 12(5), p.e0177459.
    OpenUrlCrossRefPubMed
  45. 45.↵
    Marx, V. (2013). Biology: The big challenges of big data. Nature, 498, pp.255–260.
    OpenUrlCrossRefPubMedWeb of Science
  46. 46.↵
    Moher, D. (2018). Reporting guidelines: doing better for readers. BMC Medicine, 16(233).
  47. 47.↵
    Müller, H., Naumann, F., Freytag, J.-C. (2003). Data quality in genome databases. Proc. Conf. Inf. Qual., (IQ 03) pp.269–284.
  48. 48.↵
    NIH, (2015). National Institutes of Health Plan for increasing access to scientific publications and digital scientific data from NIH funded scientific research, Feb 2015. Available at: https://grants.nih.gov/grants/NIH-Public-Access-Plan.pdf (Accessed: 5 May 2017).
  49. 49.
    Nosek, B., Spies, J. and Motyl, M. (2012). Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth Over Publishability. Perspectives on Psychological Science, 7(6), pp.615–631.
    OpenUrl
  50. 50.↵
    Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., Li, P. (2004). Taverna: A tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20, pp.3045–3054.
    OpenUrlCrossRefPubMedWeb of Science
  51. 51.↵
    Open Science Collaboration. Estimating the reproducibility of psychological science. (2015). Science, 349(6251), pp.aac4716-aac4716.
  52. 52.
    Penfold, N. (2017) “Reproducible Document Stack - supporting the next-generation research article”’, eLIFE. Available at: https://elifesciences.org/labs/7dbeb390/reproducible-document-stack-supporting-the-next-generation-research-article (Accessed: 2 Feb 2019).
  53. 53.↵
    Pettifer, S., Thorne, D., McDermott, P., Marsh, J., Villéger, A., Kell, D.B., Attwood, T.K. (2009). Visualising biological data: a semantic approach to tool and database integration. BMC Bioinformatics, 10 Suppl 6, S19.
    OpenUrl
  54. 54.↵
    Pettifer, S.R., Sinnott, J.R., Attwood, T.K. (2004). UTOPIA - User-friendly tools for operating informatics applications. Comp. Funct. Genomics, 5, pp.56–60.
    OpenUrlCrossRefPubMedWeb of Science
  55. 55.↵
    Poldrack, R. and Gorgolewski, K. (2014). Making big data open: data sharing in neuroimaging. Nature Neuroscience, 17(11), pp.1510–1517.
    OpenUrlCrossRefPubMed
  56. 56.↵
    Philip Chen, C. and Zhang, C. (2014). Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Information Sciences, 275, pp.314–347.
    OpenUrl
  57. 57.↵
    Pulverer, B. (2015). Reproducibility blues. The EMBO Journal, 34(22), pp.2721–2724.
    OpenUrlAbstract/FREE Full Text
  58. 58.↵
    Pusztai, L., Hatzis, C. and Andre, F. (2013). Reproducibility of research and preclinical validation: problems and solutions. Nature Reviews Clinical Oncology, 10(12), pp.720–724.
    OpenUrlCrossRefPubMed
  59. 59.↵
    R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Available at: https://www.R-project.org/.
  60. 60.↵
    Schönbrodt, F. (2019). Changing hiring practices towards research transparency: The first open science statement in a professorship advertisement [Blog] Felix Schönbrodt Blog. Available at: https://www.nicebread.de/open-science-hiring-practices/ (Accessed 19 Mar 2016).
  61. 61.↵
    Stodden, V. (2015). Reproducing Statistical Results. Annu. Rev. Stat. Its. Appl., 2, pp.1–19.
    OpenUrl
  62. 62.↵
    Stodden, V., Guo, P. and Ma, Z. (2013) “Toward Reproducible Computational Research: An Empirical Analysis of Data and Code Policy Adoption by Journals,” PLoS ONE, 8(6), p. e67111. doi: 10.1371/journal.pone.0067111.
    OpenUrlCrossRefPubMed
  63. 63.
    Request for Information (RFI) A software tool to assess the FAIRness of research outputs against a structured checklist of requirements [FAIRWare]. (2018). Wellcome Trust. Available at: https://wellcome.ac.uk/sites/default/files/FAIR-checking-software-request-for-information.pdf (Accessed 5 Mar 2019).
  64. 64.↵
    Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J.J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L.B., Bourne, P.E., Bouwman, J., Brookes, A.J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C.T., Finkers, R., Gonzalez-Beltran, A., Gray, A.J.G., Groth, P., Goble, C., Grethe, J.S., Heringa, J., ’t Hoen, P.A.C., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S.J., Martone, M.E., Mons, A., Packer, A.L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sansone, S.-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M.A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J., Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Sci Data, 3, 160018.
    OpenUrlCrossRefPubMed
  65. 65.↵
    Wolstencroft, K., Haines, R., Fellows, D., Williams, A., Withers, D., Owen, S., Soiland-Reyes, S., Dunlop, I., Nenadic, A., Fisher, P., Bhagat, J., Belhajjame, K., Bacall, F., Hardisty, A., Nieva de la Hidalga, A., Balcazar Vargas, M.P., Sufi, S., Goble, C. (2013). The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Research, 41, pp.1–5.
    OpenUrlCrossRefPubMedWeb of Science
  66. 66.↵
    Yong, E. (2015). Reproducibility problems in genetics research may be jeopardising lives. Genetic Literacy Project, Available at: https://www.geneticliteracyproject.org/2015/12/17/reproducibility-problems-genetics-research-may-costing-lives/ (Accessed: 12 Feb 2015).
View Abstract
Back to top
PreviousNext
Posted March 20, 2019.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Knowledge and attitudes among life scientists towards reproducibility within journal articles
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
Share
Knowledge and attitudes among life scientists towards reproducibility within journal articles
Evanthia Kaimaklioti Samota, Robert P. Davey
bioRxiv 581033; doi: https://doi.org/10.1101/581033
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Knowledge and attitudes among life scientists towards reproducibility within journal articles
Evanthia Kaimaklioti Samota, Robert P. Davey
bioRxiv 581033; doi: https://doi.org/10.1101/581033

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Scientific Communication and Education
Subject Areas
All Articles
  • Animal Behavior and Cognition (1525)
  • Biochemistry (2480)
  • Bioengineering (1737)
  • Bioinformatics (9676)
  • Biophysics (3900)
  • Cancer Biology (2971)
  • Cell Biology (4194)
  • Clinical Trials (135)
  • Developmental Biology (2627)
  • Ecology (4102)
  • Epidemiology (2031)
  • Evolutionary Biology (6898)
  • Genetics (5206)
  • Genomics (6500)
  • Immunology (2184)
  • Microbiology (6944)
  • Molecular Biology (2752)
  • Neuroscience (17280)
  • Paleontology (126)
  • Pathology (427)
  • Pharmacology and Toxicology (706)
  • Physiology (1056)
  • Plant Biology (2489)
  • Scientific Communication and Education (643)
  • Synthetic Biology (831)
  • Systems Biology (2689)
  • Zoology (430)