Improving reproducibility in computational biology research

Jason A. Papin; Feilim Mac Gabhann; Herbert M. Sauro; David Nickerson; Anand Rampadarath

doi:10.1371/journal.pcbi.1007881

Citation: Papin JA, Mac Gabhann F, Sauro HM, Nickerson D, Rampadarath A (2020) Improving reproducibility in computational biology research. PLoS Comput Biol 16(5): e1007881. https://doi.org/10.1371/journal.pcbi.1007881

Published: May 19, 2020

Copyright: © 2020 Papin et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: The authors received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

There has been much discussion in the scientific literature on a crisis of reproducibility in science [1, 2]. It has been reported that the percentage of studies that are reproducible is as low as 10% or less, depending on the discipline [3]. This inability to reproduce scientific findings from a given paper has been attributed to a lack of clarity in the methods and inherent variability in the biological system being studied [4].

Reproducibility in computational biology research is certainly a problem, yet perhaps a challenge that our field can uniquely tackle. A lack of reproducibility in computational biology research can be attributed to many factors, but incomplete or erroneous descriptions of the simulations (e.g., which software version was used), incomplete documentation on how to run simulations, or simply failing to post the relevant computer code needed to run a given simulation are common issues that occur.

Many tools have emerged that we can leverage to make computational biology research more reproducible (e.g., http://co.mbine.org/ and https://normsys.h-its.org/) and there exist articles that propose best practices, such as Ten Simple Rules for Reproducible Computational Research [5] or Ten Simple Rules for Writing and Sharing Computational Analyses in Jupyter Notebooks [6].

PLOS Computational Biology recently partnered with the Center for Reproducible Biomedical Modeling (https://reproduciblebiomodels.org/) to launch a pilot peer review workflow to assess reproducibility (https://blogs.plos.org/biologue/2020/05/05/improving-reproducibility-of-computational-models/). After authors opt-in to participation in the pilot, a peer reviewer will be solicited (in addition to our normal peer review assessment) to specifically evaluate the reproducibility of the computational modeling aspects described in the submission. All the peer reviewers can receive credit for the reviews through our partnership with ORCID (https://blogs.plos.org/plos/2019/06/youve-completed-your-review-now-get-credit-with-orcid/) and authors can elect to have the peer reviews published alongside the final publication of the work (https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/). We aim to have the review process completed in the usual time frame with a hope that the additional review will help editors and authors assess and improve the reproducibility of the work. There are a few questions we intend to investigate with this pilot, including questions the pilot can answer directly:

How popular is the desire to publish reproducible models?
What is the minimum information required to reproduce published model results?
What are the common reasons computational studies are not reproducible?

and questions that the pilot will help us start to explore and point us in the right direction:

What are the benefits of producing reproducible models?
What incentives would attract authors to make the studies available in a reproducible manner?
What tools and technologies can be created to facilitate reproducibility?
What training is required to improve community awareness of tools and practices which lead to FAIR (Findable, Accessible, Interoperable, Reusable) computational studies?

We feel the answers to these questions can help inform the community how best to encourage a change in culture toward a more FAIR [7] computational biology.

There are already some general principles we’ve learned about good practices for reproducible biomedical modeling [8]. These include stating the software used in the study, including the particular version used, providing machine readable code in supplements or uploaded to established repositories, and asking a third party to test that your methods section is free from error and of sufficient detail to reproduce the results presented in the paper.

We are certain to learn more as this pilot progresses. We will need to tackle challenges like developing and supporting repositories for the models, scaling up the “reproducibility validation” peer review effort, and creating tools to help make these assessments. As a community, we need to learn how to properly recognize the tremendous effort of reviewers involved in this work, perhaps with increased support of ORCID or Publons as tools to help peer reviewers receive credit for their work in the assessment and publication process. We need to work out how papers that have been so vetted are appropriately identified, perhaps with the use of badges that provide a “stamp of approval” for papers and associated codes that have been assessed and passed defined criteria to provide public recognition.

We think that journals and scientific publications have a critical part to play in making scientific work more reproducible. As a journal for the computational biology research community, PLOS Computational Biology is working to address this significant need within the community. As our work is more rigorously evaluated for reproducibility, we can build on each other’s contributions to advance science.

References

1. Monya B. (2016) 1,500 scientists lift the lid on reproducibility. Nature News 533(7604): 452–454. pmid:27225100
2. Barba LA. (2018) Terminologies for reproducible research. arXiv preprint arXiv:1802.03311
3. Daniele F. (2018) Opinion: Is science really facing a reproducibility crisis, and do we need it to? Proceedings of the National Academy of Sciences 115(11): 2628–2631. https://doi.org/10.1073/pnas.1708272114
- View Article
- Google Scholar
4. Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124. https://doi.org/10.1371/journal.pmed.0020124 pmid:16060722
5. Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. https://doi.org/10.1371/journal.pcbi.1003285 pmid:24204232
6. Rule A, Birmingham A, Zuniga C, Altintas I, Huang SC, Knight R. et al. (2019) Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks. PLoS Comput Biol 15(7): e1007007. https://doi.org/10.1371/journal.pcbi.1007007 pmid:31344036
7. Wilkinson M, Dumontier M, Aalbersberg I, Appleton G, Axton M, Baak A. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18 pmid:26978244
8. McDougal RA, Bulanova AS, Lytton WW. (2016) Reproducibility in Computational Neuroscience Models and Simulations IEEE Trans Biomed Eng 63(10); pmid:27046845

[ref1] 1. Monya B. (2016) 1,500 scientists lift the lid on reproducibility. Nature News 533(7604): 452–454. pmid:27225100
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Barba LA. (2018) Terminologies for reproducible research. arXiv preprint arXiv:1802.03311

[ref3] 3. Daniele F. (2018) Opinion: Is science really facing a reproducibility crisis, and do we need it to? Proceedings of the National Academy of Sciences 115(11): 2628–2631. https://doi.org/10.1073/pnas.1708272114
View Article
Google Scholar

[7] View Article

[8] Google Scholar

[ref4] 4. Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124. https://doi.org/10.1371/journal.pmed.0020124 pmid:16060722
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref5] 5. Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. https://doi.org/10.1371/journal.pcbi.1003285 pmid:24204232
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref6] 6. Rule A, Birmingham A, Zuniga C, Altintas I, Huang SC, Knight R. et al. (2019) Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks. PLoS Comput Biol 15(7): e1007007. https://doi.org/10.1371/journal.pcbi.1007007 pmid:31344036
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref7] 7. Wilkinson M, Dumontier M, Aalbersberg I, Appleton G, Axton M, Baak A. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18 pmid:26978244
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref8] 8. McDougal RA, Bulanova AS, Lytton WW. (2016) Reproducibility in Computational Neuroscience Models and Simulations IEEE Trans Biomed Eng 63(10); pmid:27046845
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar