Abstract
We constructed a survey to understand how authors and scientists view the issues around reproducibility, and how solutions such as interactive figures could enable the reproducibility of experiments from within a research article. This manuscript reports the results of this survey on the views of 251 researchers, including authors who have published in eLIFE Sciences, and those who work at the Norwich Biosciences Institutes (NBI). The survey also outlines to what extent researchers are occupied with reproducing experiments themselves and what are their desirable features of an interactive figure. Respondents considered various features for an interactive figure within a research article that would allow for them to better understand and reproduce in situ the experiment presented in the figure. Respondents said that the most important element that would enable the better reproducibility of published research would be that authors describe methods and analyses in detail. The respondents believe that having an interactive figures in published papers is a beneficial element. Whilst interactive figures are potential solutions for demonstrating technical reproducibility, we find that there are equally pressing cultural demands on researchers that need to be addressed to achieve greater success in reproducibility in the life sciences.
1. Background
Reproducibility is a defining principle of scientific research, and refers to the ability of researchers to replicate the findings of a study using same or similar methods and materials as did the original researchers (Goodman, Fanelli and Ioannidis, 2016). However, irreproducible experiments are common across all disciplines of life sciences (Grant, 2012). A recent study showed that 88% of drug-discovery experiments could not be reproduced or replicated even by the original authors, in some cases forcing retraction of the original work (Baker, 2012). Irreproducible genetic experiments with weak or wrong evidence can have negative implications on our healthcare (Yong, 2015). For example 27% of mutations linked to childhood genetic diseases cited in literature have later been discovered to be common polymorphisms or misannotations (Bell et al., 2013). While irreproducibility is not confined to biology and medical sciences (Ioannidis and Doucouliagos., 2013), irreproducible biomedical experiments pose a strong financial burden on society; an estimated $28 billion was spent on irreproducible biomedical science in 2015 in the USA alone (Freedman et al., 2015).
Computational reproducibility is an important aspect of reproducibility, relating to the way in which conclusions rely on specific analyses or other procedures undertaken on computational systems. There are two main definitions of computational reproducibility in the literature:
The original authors or others using the same data, running precisely the same workflow and getting the same results (Gentleman, 2005). Others define this concept as recomputability (Gent, 2013).
Running similar data with the same workflow, and getting similar results i.e. the workflow is reproducible (Gent, 2013).
Computational reproducibility has both technical and cultural aspects. Technical challenges to reproducibility include poorly written, incorrect, or unmaintained software, changes in software libraries1 on which tools are dependent, or incompatibility between older software and newer operating systems (Cataldo et al., 2009). Cultural challenges include insufficient descriptions of methods, reluctance to publish original data and code under FAIR (Findable, Accessible, Interoperable, and Reusable) principles, and other social factors such as the favouring of high prestige or high impact science publications over performing rigorous and reproducible science.
Several projects have attempted to address some of the technical aspects of reproducibility by making it easier for authors to disseminate fully reproducible workflows and data, and for readers to perform computations. For example: F1000 Living Figure (Colomb and Brembs, 2015); Whole Tale Project (Brinckman et al., 2018); RetroZIP project (https://www.reprozip.org/); Python compatible tools and widgets (IPython notebook interactive widgets, Jupyter Notebooks); FigShare (http://www.figshare.com) as an example of a scientific data repository; Galaxy (Afgan et al., 2018); CyVerse (formerly iPlant Collaborative, Goff, 2011); myExperiment (Goble et al., 2010); UTOPIA (Pettifer et al., 2009, 2004); GigaScience Database (Sneddon, Li and Edmunds, 2012); Taverna (Wolstencroft et al., 2013; Hull et al., 2006; Oinn et al., 2004); workflow description efforts such as the Common Workflow Language (Amstutz et. al., 2016); and Docker (http://www.docker.com), Singularity (https://singularity.lbl.gov/, Kurtzer, Sochat and Bauer, 2017), and other container systems.
Even though these tools are widely available, and seem to address many of the issues of technical and cultural reproducibility, they have not yet become a core part of the life sciences experimental and publication lifecycle. There is an apparent disconnection between the development of tools addressing reproducibility and their use by the wider scientific and publishing communities who might benefit from them. However, there have been notable efforts to make this connection. The Living Figure by Björn Brembs and Julien Colomb was the first prototype of a dynamic figure that allowed readers to change parameters of a statistical computation underlying a figure (Colomb and Brembs, 2015). The first eLIFE computationally reproducible article, formed by converting manuscripts created in a specific format (using the Stencila Desktop, https://stenci.la/, and saved as a Document Archive file) into interactive documents, offers more interactivity at the publication level, allowing the reader to “play” with the article and its figures when viewed in a web browser (eLIFE Sciences, 2019).
While there are few incentives to promote cultural reproducibility (Higginson and Munafò, 2016), efforts in most science domains are being made to establish a culture where an expectation to share data for all publications according to the FAIR principles is prioritised. It is widely accepted that better reproducibility will benefit the scientific community and the general public (‘NIH, 2015’, n.d.; Wilkinson et al., 2016). Although studies have suggested that reproducibility in science is a serious issue (Pulverer, 2015; Stodden, Guo and Ma, 2013), with costly repercussions, fewer studies have investigated the attitudes and knowledge of researchers around reproducibility and what would be the most desirable solutions and infrastructures to enable reproducibility. In particular, minimal research has been conducted into the frequency of difficulties experienced with reproducibility, the perception of its importance, and preferences with respect to potential solutions among the general life sciences community. This paper presents a survey that was, in part, designed to inform the design of the reproducible document by canvassing respondents’ preferred features for interactive figures. We aimed to address this critical gap in reproducibility knowledge, in order to inform the development of tools that better meet the needs of producers and consumers of life science research. We constructed the survey in order to understand how the following are experienced by the respondents:
Computational reproducibility: issues with accessing data, code and methodology parameters, and how solutions such as interactive figures could promote reproducibility from within an article.
Cultural reproducibility: attitudes towards reproducibility, the social factors hindering reproducibility, and interest in interactive figures and their feature preferences.
3. Methods
Population and sample
Our sample populations were selected to include all life sciences communities across levels of seniority, discipline and level of experience with the issues we wished to survey. The first survey was conducted in November 2016 and sent out to 750 researchers working in the Norwich Biosciences Institutes (NBI) at post-doctoral level or above. The NBI is a partnership of four UK research institutions: the Earlham Institute (formerly known as The Genome Analysis Centre), the John Innes Centre, the Sainsbury Centre, and the Institute of Food Research (now Quadram Institute Bioscience). Invitations to participate were distributed via email, with a link to the survey. The second survey, similar to the first but with amendments and additions, was distributed in February 2017 to a random sample of 1662 active researchers who had published papers in the eLIFE journal. Invitations to participate were sent using email by eLIFE staff. We achieved an 15% (n=112) response rate from the NBI researchers, and an 8% response rate from the eLIFE survey (n=139). Table 1 shows the survey questions. Questions were designed to give qualitative and quantitative answers on technical and cultural aspects of reproducibility. Questions assessed the frequency in difficulties encountered in accessing data, the reasons for these difficulties, and how respondents currently obtain data underlying published articles. They measured understanding of what constitutes reproducibility of experiments, interactive figures, and reproducible computational data. Finally, we evaluated the perceived benefit of interactive figures and of reproducing computational experiments, and which features of interactive figures would be most desirable.
Questions used to survey the knowledge of respondents about research reproducibility. Questions indicated with an asterisk were only available to the eLIFE survey. Answer options to the questions are shown in Supplementary section 1.
Statistical analysis
Results are typically presented as proportions of those responding, stratified by the respondent’s area of work, training received, and version of the survey as appropriate. Chi-square tests for independence were used to test for relationships between responses to specific questions, or whether responses varied between samples. Analysis was conducted using Microsoft Excel and R (version 3.5.0; R Core Team, 2018), and all supplementary figures and data are available on Figshare (see Data Availability).
We assessed if there was a significant difference in the ability and willingness to reproduce published results between the cohort of eLIFE respondents who understand the term “computationally reproducible data” and those who do not. We did not include those that replied “Unsure” with regards to their understanding of the term “computationally reproducible data”. The respondents who chose “yes tried reproducing results, but unsuccessfully”, “have not tried to reproduce results” and “it is not important to reproduce results” were group together under “unsuccessfully”.
Results
Characteristics of the sample
Figure 1 shows the distribution of areas of work of our respondents, stratified by survey sample. Genomics (proportion in whole sample = 22%), biochemistry (17%), and computational biology (15%) were the most common subject areas endorsed in both NBI and eLIFE samples. With regard to how often respondents use bioinformatics tools, 25% replied “never”, 39% “rarely”, and 36% “often”. Many (43%) received statistical training, (31%) bioinformatic training, (20%) computer science training.
The type of data the NBI and eLIFE respondents work with in percentage of prevalence. Responses were not mutually exclusive. The choices of data types were the same as the article data types available in the eLIFE article categorisation system.
Access to data and bioinformatics tools
In both samples, 90% of those who responded reported having tried to access data underlying a published research article (Figure 2). Of those who had tried, few had found this “easy” (14%) or “very easy” (2%) with 41% reporting that the process was “difficult” and 5% “very difficult”. Reasons for difficulty were chiefly cultural (Figure 2), in that the data was not made available alongside the publication (found by 63% of those who had tried to access data), or authors could not be contacted or did not respond to data requests (44%). Relatively few found data unavailable for technical reasons of data size (17%), confidentiality (10%) or commercial sensitivity (11%). With respect to data sources, 57% of the total sample have used open public databases, 48% reported data was available with a link in the paper, and 48% had needed to contact authors.
Left panel: whether respondents have attempted to access data underlying previous publications and the level of difficulty typically encountered in doing so. Right panel: the reasons given by respondents for being unable to access data (restricted to those who have attempted to access data). See supplementary material for full wording of questions and responses.
Very few of the respondents either never (2%) or rarely (8%) had problems with running, installing, configuring bioinformatics software. Problems with software were encountered often (29%) or very often (15%) suggesting that nearly half of respondents regularly encountered technical barriers to computational reproducibility.
Understanding of reproducibility, training and successful replication
The majority of respondents reported that they understood the term “reproducibility of experiments” and science. In contrast, most (52%) participants did not know what the term “computationally reproducible data” means, while 26% did know and 22% were unsure. We received several explanations (free text responses) of the term “computationally reproducible data”, some of which were more accurate than others (Supplementary section, free responses to question 13).
Some (18%) reported not attempting to reproduce or revalidate published research. Very few (N=5; 6%) of the sample endorsed the option that “it is not important to reproduce other people’s published results” (Supplementary figure 1). Even though the majority (60%) reported successfully reproducing published results, almost a quarter of the respondents found that their efforts to reproduce any results were unsuccessful (23%).
Supplementary Table 1 shows the willingness and ability of respondents in reproducing experiments stratified by the training they received and understanding of the term “computationally reproducible data”. We found significant difference between the ability to reproduce published experiments and knowing the meaning of the term “computationally reproducible data”. Among the 25 respondents who understood the term “computationally reproducible data”, 18 (72%) had successfully reproduced previous work, compared to only 26 (52%) of the 50 who responded that they did not understand the term (chi-square test for independence, p=0.048).
There was no evidence for a difference in the ability and willingness to reproduce published results between the respondents who use bioinformatics tools often, and those who use them rarely or never (data not shown). The majority of the respondents who use bioinformatics tools often were coming from the scientific backgrounds of Biophysics, Biochemistry, Computational Biology and Genomics. Most of the respondents who answered “reproducibility is not important” and “haven’t tried reproducing experiments” were scientists coming from disciplines using computational or bioinformatics tools “rarely” or “never” (Supplementary Table 2).
Improving Reproducibility of Published Research
The vast majority (91%) of respondents stated that authors describing all methodology steps in detail, including any formulae analysing the data, would be the most effective way to make published science more reproducible. Only around half endorsed the view that “authors should provide the source code of any custom software used to analyse the data and that the software code is well documented” (53%), and that authors provide a link to the raw data (49%) (Supplementary figure 2). Two respondents suggested that achieving better science reproducibility would be easier if funding was more readily available for reproducing the results of others and if there were opportunities to publish the reproduced results (Supplementary section, free responses). Within the same context, some respondents recognised the current culture in science that there are not sufficient incentives in publishing reproducible (or indeed negative findings) papers, but rather being rewarded in publishing as many papers as possible in high Impact Factor journals (Supplementary section, free responses).
Interactive Figures
Participants ordered in terms of preference features for an interactive figure within an article, which included choices such as “easy to manipulate” as the most preferred, and have easy to define parameters (Figure 3). Generally, the answers from both the eLIFE and NBI surveys followed similar trends.
Responses to question 9: Respondents were asked to rank in order of preference the above features, with 1 most preferred feature, to 11 the least preferred feature. The average score for each feature was calculated in order of preference as selected by the respondents from both NBI and eLIFE surveys. The lower the average score value (x-axis), the more preferred the feature (y-axis).
Furthermore, free text responses were collected, and most respondents stated that having further insights into the data presented in the figure would be beneficial (Supplementary section, free responses). The majority of the respondents perceive a benefit in having interactive figures in published papers for both readers and authors (Figure 4). Examples of insights included: the interactive figure would allow visualising further points on the plot from data in the supplementary section as well as be able to alter the data that is presented in the figure; having an interactive figure as a movie, or to display protein 3D structures, would be beneficial to readers. The remaining responses we categorised as software related, which included suggestions of software that could be used to produce a figure that can be interactive, such as R Shiny (Chang 2015; Chang et al. 2016). A moderate proportion of eLIFE respondents (19%) and NBI (27%) stated that they had no opinion on the utility of interactive figures. Free text answers for this group suggested that they had never seen or interacted with such a figure before, and no indication was given that an interactive figure would help their work.
Responses to question 11: The level of perception of benefit (%) to having the ability to publish papers with interactive figures. The benefit to the author, to the readers of the author’s papers and to the papers the author reads. Answers include the responses from both NBI and eLIFE surveys.
The majority of the respondents also said that they see benefit in automatically reproducing computational experiments, and manipulating and interacting with parameters in computational analysis workflows; equally favourable was to be able to computationally reproduce statistical analyses (Figure 5). Despite this perceived benefit, most respondents (61%) indicated that the ability to include an interactive figure would not affect their choice of journal when seeking to publish their research.
Responses to question 14 (both eLIFE and NBI): Assessing the perceived benefit (%) to the respondents to be able to automatically reproduce computational experiments or other analyses (including statistical tests) described in the paper.
5. Discussion
This study highlights the difficulties currently experienced in reproducing experiments, and expressed positive attitudes of scientists involved in the current publishing system towards enabling and promoting reproducibility of published experiments through interactive elements in online publications. All respondents of the survey were active life sciences researchers and therefore we believe the opinions collected are representative of researchers in life sciences who are routinely reading and publishing research. While progress has been made in publishing standards across all life science disciplines, the opinions of the respondents reflect previously published shortcomings of the publishing procedures (Müller et al., 2003; Marx, 2013; Stodden, 2015): lack of data and code provision; storage standards; not including or requiring detailed description of methods and code structure in the published papers. When data is difficult to obtain the reproducibility problem is exacerbated. However the level of interest and incentives in reproducing published research is at its infancy, or it is not the researchers’ priority, something also mentioned extensively in previous literature (Baker, 2016; Stodden, 2015; Open Science Collaboration, 2015; Collins and Tabak, 2014). Responses to our surveys suggested that most life scientists understand that science becomes implicitly more reproducible if methods (including data, analysis, and code) are well-described and available, and perceive a potential benefit of tools that enable this. However, respondents stated they could see the benefit in having interactive figures for their readers and being able as authors to present their data as interactive figures, but that the availability of this facility would not affect their decisions on where to publish. Therefore, given that technologies exist to aid reproducibility and authors know they are beneficial, many scientific publications do not meet basic standards of reproducibility. Respondents endorsed articles which include interactive elements, where access to the raw data, code, and detailed analysis steps in the form of an interactive figure would help article readers better understand the paper and the experimental design and methodology, and improve the reproducibility of the experiment presented in the interactive figure, especially computational experiments. This contradiction suggests that cultural factors play an underestimated role in reproducibility.
Retraction rates (Cokol et al., 2008) would suggest that the current publishing system is yet to provide a mechanism to reliably check whether a published study is reproducible. There remains a perception that researchers do not get credit for reproducing the work of others or publishing negative results. Whilst some journals do explicitly state that they welcome negative results articles (e.g. PLOS One “Missing Pieces” collection), this is by no means the norm in life science publishing as evidenced by low, and dropping, publication rates of negative findings (Franco et al., 2014, Fanelli, 2011). Ideally the publication system would enable checking of reproducibility at the peer-review stage, by authors, reviewers and editors providing all data (including raw data), a full description of methods including statistical analysis parameters, any negative findings based on previous work, open source software code, etc. (Iqbal et al., 2016). Peer reviewers would then be better able to check for anomalies, and editors could perform the final check to ensure that the science paper to be published is presenting true, valid, and reproducible research. Some respondents have suggested that if reviewers and/or editors were monetarily compensated, spending time to reproduce or validate the computational experiments in manuscripts would become more feasible, and would aid the irreproducibility issue. However, paying reviewers does not necessarily ensure that they would be more diligent in checking or trying to reproduce results (Hershey, 1992) and there must be optimal ways to ensure effective pressure is placed upon the authors and publishing journals to have better publication standards (Announcement: Reducing our irreproducibility, 2013; Pusztai, Hatzis and Andre, 2013). The increasing adoption by biomedical journals of reporting standards for experimental design, methods and results, provide a framework for to harmonise the description of scientific processes to enable reproducibility, although these are not universally enforced (Moher, 2018). Similarly, concrete funding within research grants for implementing reproducibility itself, manifested as actionable Data Management Plans (http://www.dcc.ac.uk, 2019) rather than what is currently a by-product of the publishing process, could give a level of confidence to researchers who would want to reproduce previous work by incorporating that data in their own projects.
Our findings are in accordance with the current literature (Berg, 2018; Pulverer, 2015) that highlight that the lack of data access at the publication stage is one of the major reasons leading to the irreproducibility of published studies. Even with current policies mandating data openness (NIH, 2015; Wilkinson et al., 2016), authors still fail to include their data alongside their publication. This is supported by our findings that the majority of respondents replied that data is either not available upon publication (57%) or authors cannot be reached/are unresponsive to data provision requests (44%), which continues to be a cultural artifact of using a paper’s methods section as a description of steps to reproduce analysis, rather than a fully reproducible solution involving public data repositories, open source code, and comprehensive documentation. Pre-print servers such as bioRxiv have been taken up rapidly (Abdill, 2018), especially in the genomics and bioinformatics domains, and this has the potential to remove delays in publication whilst simultaneously providing a “line in the sand” with a Digital Object Identifier (DOI) and maintaining the requirements for FAIR data. In some cases sensitivity of data might discourage authors from data sharing, (Figueiredo, 2017; Hollis, 2016), but this reason was only reported by a small proportion of our respondents. Whilst efforts such as OpenTrials (Goldacre, 2016) are attempting to apply the FAIR principles to clinical trial data, the service is by no means ubiquitous.
Reproducibility of experiments could be improved with better storage solutions for large data files and citing them within the publication document, especially those in the order of terabytes, for their proper reusability (Philip Chen and Zhang, 2014; Poldrack and Gorgolewski, 2014; Faniel and Zimmerman, 2011). Currently, there are several services that allow storing large data files and perform cloud analyses, such as CyVerse, Amazon Web Services (Amazon Web Services, Inc., 2019; Fusaro et al., 2011; Hazelhurst, 2008) and Google Genomics (https://cloud.google.com/genomics/). Despite the potential advantage these services can provide for data accessibility, they do not implicitly solve the problem of data reusability, when data is too large to be stored locally or transferred via slow internet connections, or there is no route to attach metadata that describes the datasets sufficiently for reuse or integration with other datasets. There is also the question of data repository longevity -who funds the repositories for decades into the future? Data within public repositories with specific deposition requirements (such as the EMBL-EBI European Nucleotide Archive), might not be associated or annotated with standardised metadata that describes it accurately (Attwood et al., 2009), rather the bare minimum for deposition. In addition, corresponding authors often move on from projects and institutions or the authors themselves can no longer access the data, meaning “data available on request” ceases to be a viable option to source data or explanations of methods.
In a 2016 survey of 3987 National Science Foundation Directorate of Biological Sciences principal investigators (BIO PIs), expressed their greatest unmet training needs by their institutions. These were in the areas of integration of multiple data (89%), data management and metadata (78%) and scaling analysis to cloud/high performance computing (71%). The aforementioned data and computing elements are integral to the correct knowledge “how to” for research reproducibility. Our findings indicated that those who stated they had experience in informatics also stated they are better able to attempt and reproduce results. Practical bioinformatics and data management training, rather than in specific tools, may be an effective way of reinforcing the notion that researchers’ contributions towards reproducibility are a responsibility that requires active planning and execution. This may be especially effective when considering the training requirements of wet-lab and field scientists, who are becoming increasingly responsible for larger and more complex computational datasets. Further research needs to be undertaken to better understand how researchers’ competence in computational reproducibility may be linked to their level of informatics training.
Respondents mentioned that there are word count restrictions in papers, and journals often ask authors to shorten methods sections and perhaps move text to supplementary information placed many times in an unorganised fashion or having to remove it altogether. This is a legacy product of the hard-copy publishing era and, readability aside, word limits are not consequential for internet journals. Even so, if the word count limit was only applicable to the introduction, results and discussion sections, then the authors could describe methods in more detail within the paper, without having to move that valuable information in the supplementary section. When methods are citing methodology techniques as described in other papers, where those original references are hard to obtain, typically through closed access practices or by request mechanisms as noted above, then this can be an additional barrier to the reproducibility of the experiment. This suggests that there are benefits to describing the methods in detail and stating that they are similar to certain (cited) references as well as document the laboratory's expertise in a particular method. However, multi-institutional or consortium papers are becoming more common with ever-increasing numbers of authors on papers, which adds complexity to how authors should describe every previous method available that underpins their research (Gonsalves, 2014). There is no obvious solution to this issue. Highly specialised methods (e.g. electrophysiology expertise, requirements for large computational resources or knowledge of complex bioinformatics algorithms) and specific reagents (e.g. cell lines, antibodies) might not be readily available to other research groups. As stated by some respondents, in certain cases the effective reproducibility of experiments is obstructed by numerical issues with very small or very large matrices or datasets, or differing versions of analysis software used, perhaps to address bugs in analytical code, will cause a variation in the reproduced results.
Previous studies have provided strong evidence that there is a need for better technical systems and platforms to enable and promote the reproducibility of experiments. We provide additional evidence that that paper authors and readers perceive a benefit from having an interactive figure that would allow for the reproducibility of the experiment shown in the figure. The figure would give access to the raw data, code and detailed data analysis steps, allow for in situ reproducing computational experiments by re-running code including statistical analyses “live” within the paper. The findings of this survey have helped eLIFE to understand what is desirable for an interactive figure, elements of which have informed their first computationally reproducible document (eLIFE Sciences, 2019). Despite the benefits that interactive documents and figures can provide to the publishing system, and that those benefits that are in demand by the scientific community, work is needed in order to promote and support their use. Given the diversity of biological datasets and ever-evolving methods for data generation and analysis, it is unlikely that a single interactive figure infrastructure type can support all types of data. More research into how different types of data can be supported and presented in papers with interactivity needs to be undertaken, yet problems with data availability and data sizes will persist - many studies comprise datasets that are too large to upload and render within web browsers in a reasonable timescale. Even if the data are available through well-funded repositories with fast data transfers, e.g. the INSDC databases, are publishers ready to bear the extra costs of supporting the infrastructure and people required to develop or maintain such interactive systems in the long run? These are questions that need to be further investigated, particularly when considering any form of industry standardisation of such interactivity in the publishing system.
We show that providing tools to scientists who are not computationally aware also requires a change in culture, as many aspects of computational reproducibility require a change in publishing behaviour and competence in the informatics domain. This study provides some evidence that those scientists who were aware of both what computationally reproducible data is and were able to successfully reproduce experiments, were those who had more training and experience in bioinformatics and computer science. Encouraging and incentivising scientists to conduct transparent, reproducible and replicable research should be prioritised to help solve the irreproducibility issue, and implementing hiring practices with open science at the core of research roles (Schönbrodt, 2019) will encourage attitudes to change across faculty departments and institutions.
Another potential solution to the reproducibility crisis is to identify better (quantifiable) metrics of research reproducibility and its scientific impact. The current assessment of the impact of research articles are a set of quantifiable metrics that do not evaluate research reproducibility, but stakeholders are starting to request that checklists and tools are provided to improve these assessments (Wellcome Trust, 2018). It is harder to find a better approach that is based on a thoroughly informed analysis by unbiased experts in the field that would quantify the reproducibility level of the research article (Flier, 2017). That said, top-down requirements from journals and funders to release reproducible data and code may go some way to improving computational reproducibility within the life sciences, but this will also rely on the availability of technical solutions that are accessible and useful to the majority of scientists.
Opinions are mixed regarding the extent and severity of the reproducibility crisis (Flier, 2017). From our findings, and given the ongoing release of tools and platforms for technical reproducibility, future efforts should be spent in tackling the cultural behaviour of scientists, especially when faced with the need to publish for career progression.
Data Availability
All data files are available via this url: https://doi.org/10.6084/m9.figshare.c.4436912.v5
Acknowledgements
This project is funded by a BBSRC iCASE Studentship (project reference: BB/M017176/1). We would like to thank all the respondents of the surveys for their time. We would also like to thank George Savva from the Quadram Institute (QIB, UK) for comments and suggestions for this manuscript; Paul Shannon, Nathan Lisgo, and Jennifer McLennan from eLIFE Sciences Publications Ltd, with whom the corresponding author collaborates as an iCASE student; as well as Ian Mulvany, former eLIFE Head of Development, for his help in developing the survey questionnaire.
Footnotes
↵1 “A software library is a collection of data and programming code utilised to develop software programs and applications. It is designed to help both the programmer and the programming language compiler in building and executing software”. (Technopaedia: https://www.techopedia.com/definition/3828/software-library).