Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Developing a modern data workflow for living data

View ORCID ProfileGlenda M. Yenni, View ORCID ProfileErica M. Christensen, View ORCID ProfileEllen K. Bledsoe, View ORCID ProfileSarah R. Supp, View ORCID ProfileRenata M. Diaz, View ORCID ProfileEthan P. White, View ORCID ProfileS.K. Morgan Ernest
doi: https://doi.org/10.1101/344804
Glenda M. Yenni
1Department of Wildlife Ecology and Conservation, University of Florida
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Glenda M. Yenni
Erica M. Christensen
1Department of Wildlife Ecology and Conservation, University of Florida
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Erica M. Christensen
Ellen K. Bledsoe
2School of Natural Resources and the Environment, University of Florida
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ellen K. Bledsoe
Sarah R. Supp
3Data Analytics Program, Denison University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sarah R. Supp
Renata M. Diaz
2School of Natural Resources and the Environment, University of Florida
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Renata M. Diaz
Ethan P. White
1Department of Wildlife Ecology and Conservation, University of Florida
4Informatics Institute, University of Florida
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ethan P. White
S.K. Morgan Ernest
1Department of Wildlife Ecology and Conservation, University of Florida
5Biodiversity Institute, University of Florida
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for S.K. Morgan Ernest
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Data management and publication are core components of the research process. An emerging challenge that has received limited attention in biology is managing, working with, and providing access to data under continual active collection. “Living data” present unique challenges in quality assurance and control, data publication, archiving, and reproducibility. We developed a living data workflow for a long-term ecological study that addresses many of the challenges associated with managing this type of data. We do this by leveraging existing tools to: 1) perform quality assurance and control; 2) import, restructure, version, and archive data; 3) rapidly publish new data in ways that ensure appropriate credit to all contributors; and 4) automate most steps in the data pipeline to reduce the time and effort required by researchers. The workflow uses two tools from software development, version control and continuous integration, to create a modern data management system that automates the pipeline.

  • Glossary

    CI/continuous integration
    (also see Box 2) the continuous application of quality control. A practice used in software engineering to continuously implement processes for automated testing and integration of new code into a project.
    Git
    (also see Box 1) Git is an open source program for tracking changes in text files (version control), and is the core technology that GitHub, the social and user interface, is built on top of.
    GitHub
    (also see Box 1) a web-based hosting service for version control using git.
    Github-Travis integration
    connects the Travis continuous integration service to build and test projects hosted at GitHub. Once set up, a GitHub project will automatically deploy CI and test pull requests through Travis.
    Github-Zenodo integration
    connects a Github project to a Zenodo archive. Zenodo takes an archive of your GitHub repository each time you create a new release.
    Living data
    data that continue to be updated and added to, while simultaneously being made available for analyses. For example: long-term observational studies, experiments with repeated sampling, data derived from automated sensors (e.g., weather stations or GPS collars).
    Pull request
    A set of proposed changes to the files in a GitHub repository made by one collaborator, to be reviewed by other collaborators before being accepted or rejected.
    QA/QC
    Quality Assurance/Quality Control. The process of ensuring the data in our repository meet a certain quality standard.
    Repository
    a location (folder) containing all the files for a particular project. Files could include code, data files, or documentation. Each file’s revision history is also stored in the repository.
    testthat
    an R package that facilitates formal, automated testing
    Travis CI
    (also see Box 2) a hosted continuous integration service that is used to test and build GitHub projects. Open source projects are tested at no charge.
    unit test
    a software testing approach that checks to make sure that pieces of code work in the expected way
    Version control
    A system for managing changes made to a file or set of files over time that allows the user to a) see what changes were made when and b) revert back to a previous state if desired
    Zenodo
    a general, open-access, research data repository
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
    Back to top
    PreviousNext
    Posted June 12, 2018.
    Download PDF
    Email

    Thank you for your interest in spreading the word about bioRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    Developing a modern data workflow for living data
    (Your Name) has forwarded a page to you from bioRxiv
    (Your Name) thought you would like to see this page from the bioRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    Developing a modern data workflow for living data
    Glenda M. Yenni, Erica M. Christensen, Ellen K. Bledsoe, Sarah R. Supp, Renata M. Diaz, Ethan P. White, S.K. Morgan Ernest
    bioRxiv 344804; doi: https://doi.org/10.1101/344804
    Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
    Citation Tools
    Developing a modern data workflow for living data
    Glenda M. Yenni, Erica M. Christensen, Ellen K. Bledsoe, Sarah R. Supp, Renata M. Diaz, Ethan P. White, S.K. Morgan Ernest
    bioRxiv 344804; doi: https://doi.org/10.1101/344804

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Ecology
    Subject Areas
    All Articles
    • Animal Behavior and Cognition (2427)
    • Biochemistry (4784)
    • Bioengineering (3328)
    • Bioinformatics (14656)
    • Biophysics (6629)
    • Cancer Biology (5162)
    • Cell Biology (7417)
    • Clinical Trials (138)
    • Developmental Biology (4355)
    • Ecology (6869)
    • Epidemiology (2057)
    • Evolutionary Biology (9903)
    • Genetics (7338)
    • Genomics (9509)
    • Immunology (4545)
    • Microbiology (12657)
    • Molecular Biology (4936)
    • Neuroscience (28280)
    • Paleontology (199)
    • Pathology (804)
    • Pharmacology and Toxicology (1388)
    • Physiology (2019)
    • Plant Biology (4487)
    • Scientific Communication and Education (976)
    • Synthetic Biology (1297)
    • Systems Biology (3909)
    • Zoology (725)