Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

From peer-reviewed to peer-reproduced: a role for data standards, models and computational workflows in scholarly publishing

View ORCID ProfileAlejandra Gonzalez-Beltran, Peter Li, Jun Zhao, Maria Susana Avila-Garcia, Marco Roos, Mark Thompson, Eelke van der Horst, Rajaram Kaliyaperumal, Ruibang Luo, Tin-Lap Lee, Tak-wah Lam, Scott C. Edmunds, Susanna-Assunta Sansone, Philippe Rocca-Serra
doi: https://doi.org/10.1101/011973
Alejandra Gonzalez-Beltran
Oxford e-Research Centre, University of Oxford;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Alejandra Gonzalez-Beltran
  • For correspondence: alejandra.gonzalezbeltran@oerc.ox.ac.uk
Peter Li
GigaScience, BGI HK Research Institute;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jun Zhao
InfoLab21, Lancaster University;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Maria Susana Avila-Garcia
Nuffield Department of Medicine, Experimental Medicine Division, John Radcliffe Hospital;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marco Roos
Department of Human Genetics, Leiden University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mark Thompson
Department of Human Genetics, Leiden University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eelke van der Horst
Department of Human Genetics, Leiden University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rajaram Kaliyaperumal
Department of Human Genetics, Leiden University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ruibang Luo
HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer S;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tin-Lap Lee
School of Biomedical Sciences and CUHK-BGI Innovation Institute of Trans-omics, The Chinese Universi
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tak-wah Lam
HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer S;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Scott C. Edmunds
GigaScience, BGI HK Research Institute;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Susanna-Assunta Sansone
Oxford e-Research Centre, University of Oxford;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Philippe Rocca-Serra
Oxford e-Research Centre, University of Oxford;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Info/History
  • Metrics
  • Data Supplements
  • Preview PDF
Loading

Abstract

Motivation: Reproducing the results from a scientific paper can be challenging due to the absence of data and the computational tools required for their analysis. In addition, details relating to the proce- dures used to obtain the published results can be difficult to discern due to the use of natural language when reporting how experiments have been performed. The Investigation/Study/Assay (ISA), Nanop- ublications (NP) and Research Objects (RO) models are conceptual data modelling frameworks that can structure such information from scientific papers. Computational workflow platforms can also be used to reproduce analyses of data in a principled manner. We assessed the extent by which ISA, NP and RO models, together with the Galaxy workflow system, can capture the experimental processes and reproduce the findings of a previously published paper reporting on the development of SOAPdenovo2, a de novo genome assembler. Results: Executable workflows were developed using Galaxy which reproduced results that were con- sistent with the published findings. A structured representation of the information in the SOAPdenovo2 paper was produced by combining the use of ISA, NP and RO models. By structuring the information in the published paper using these data and scientific workflow modelling frameworks, it was possible to explicitly declare elements of experimental design, variables and findings. The models served as guides in the curation of scientific information and this led to the identification of inconsistencies in the original published paper, thereby allowing its authors to publish corrections in the form of an errata. Availability: SOAPdenovo2 scripts, data and results are available through the GigaScience Database: http://dx.doi.org/10.5524/100044; the workflows are available from GigaGalaxy: http://galaxy. cbiit.cuhk.edu.hk; and the representations using the ISA, NP and RO models are available through the SOAPdenovo2 case study website http://isa-tools.github.io/soapdenovo2/. Contact: philippe.rocca- serra@oerc.ox.ac.uk and susanna.assunta-sansone@oerc.ox.ac.uk

Copyright 
The copyright holder for this preprint is the author/funder. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
  • Posted December 8, 2014.

Download PDF

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
From peer-reviewed to peer-reproduced: a role for data standards, models and computational workflows in scholarly publishing
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
Share
From peer-reviewed to peer-reproduced: a role for data standards, models and computational workflows in scholarly publishing
Alejandra Gonzalez-Beltran, Peter Li, Jun Zhao, Maria Susana Avila-Garcia, Marco Roos, Mark Thompson, Eelke van der Horst, Rajaram Kaliyaperumal, Ruibang Luo, Tin-Lap Lee, Tak-wah Lam, Scott C. Edmunds, Susanna-Assunta Sansone, Philippe Rocca-Serra
bioRxiv 011973; doi: https://doi.org/10.1101/011973
del.icio.us logo Digg logo Reddit logo Technorati logo Twitter logo CiteULike logo Connotea logo Facebook logo Google logo Mendeley logo
Citation Tools
From peer-reviewed to peer-reproduced: a role for data standards, models and computational workflows in scholarly publishing
Alejandra Gonzalez-Beltran, Peter Li, Jun Zhao, Maria Susana Avila-Garcia, Marco Roos, Mark Thompson, Eelke van der Horst, Rajaram Kaliyaperumal, Ruibang Luo, Tin-Lap Lee, Tak-wah Lam, Scott C. Edmunds, Susanna-Assunta Sansone, Philippe Rocca-Serra
bioRxiv 011973; doi: https://doi.org/10.1101/011973

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Scientific Communication and Education
Subject Areas
All Articles
  • Animal Behavior and Cognition (620)
  • Biochemistry (860)
  • Bioengineering (516)
  • Bioinformatics (4762)
  • Biophysics (1503)
  • Cancer Biology (1030)
  • Cell Biology (1448)
  • Clinical Trials (52)
  • Developmental Biology (974)
  • Ecology (1633)
  • Epidemiology (808)
  • Evolutionary Biology (3690)
  • Genetics (2513)
  • Genomics (3266)
  • Immunology (602)
  • Microbiology (2416)
  • Molecular Biology (895)
  • Neuroscience (6488)
  • Paleontology (42)
  • Pathology (124)
  • Pharmacology and Toxicology (220)
  • Physiology (287)
  • Plant Biology (893)
  • Scientific Communication and Education (247)
  • Synthetic Biology (386)
  • Systems Biology (1323)
  • Zoology (162)