Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

ODDPub – a Text-Mining Algorithm to Detect Data Sharing in Biomedical Publications

View ORCID ProfileNico Riedel, Miriam Kip, View ORCID ProfileEvgeny Bobrov
doi: https://doi.org/10.1101/2020.05.11.088021
Nico Riedel
1QUEST Center for Transforming Biomedical Research, Berlin Institute of Health (BIH), Berlin, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nico Riedel
  • For correspondence: nico.riedel@bihealth.de
Miriam Kip
1QUEST Center for Transforming Biomedical Research, Berlin Institute of Health (BIH), Berlin, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Evgeny Bobrov
1QUEST Center for Transforming Biomedical Research, Berlin Institute of Health (BIH), Berlin, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Evgeny Bobrov
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Open research data are increasingly recognized as a quality indicator and an important resource to increase transparency, robustness and collaboration in science. However, no standardized way of reporting Open Data in publications exists, making it difficult to find shared datasets and assess the prevalence of Open Data in an automated fashion.

We developed ODDPub (Open Data Detection in Publications), a text-mining algorithm that screens biomedical publications and detects cases of Open Data. Using English-language original research publications from a single biomedical research institution (n=8689) and randomly selected from PubMed (n=1500) we iteratively developed a set of derived keyword categories. ODDPub can detect data sharing through field-specific repositories, general-purpose repositories or the supplement. Additionally, it can detect shared analysis code (Open Code).

To validate ODDPub, we manually screened 792 publications randomly selected from PubMed. On this validation dataset, our algorithm detected Open Data publications with a sensitivity of 0.74 and specificity of 0.97. Open Data was detected for 11.5% (n=91) of publications. Open Code was detected for 1.4% (n=11) of publications with a sensitivity of 0.73 and specificity of 1.00. We compared our results to the linked datasets found in the databases PubMed and Web of Science.

Our algorithm can automatically screen large numbers of publications for Open Data. It can thus be used to assess Open Data sharing rates on the level of subject areas, journals, or institutions. It can also identify individual Open Data publications in a larger publication corpus. ODDPub is published as an R package on GitHub.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • https://doi.org/10.17605/OSF.IO/YV5RX

  • https://doi.org/10.5281/zenodo.3760970

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted May 12, 2020.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
ODDPub – a Text-Mining Algorithm to Detect Data Sharing in Biomedical Publications
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
ODDPub – a Text-Mining Algorithm to Detect Data Sharing in Biomedical Publications
Nico Riedel, Miriam Kip, Evgeny Bobrov
bioRxiv 2020.05.11.088021; doi: https://doi.org/10.1101/2020.05.11.088021
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
ODDPub – a Text-Mining Algorithm to Detect Data Sharing in Biomedical Publications
Nico Riedel, Miriam Kip, Evgeny Bobrov
bioRxiv 2020.05.11.088021; doi: https://doi.org/10.1101/2020.05.11.088021

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Scientific Communication and Education
Subject Areas
All Articles
  • Animal Behavior and Cognition (3609)
  • Biochemistry (7585)
  • Bioengineering (5533)
  • Bioinformatics (20825)
  • Biophysics (10344)
  • Cancer Biology (7995)
  • Cell Biology (11653)
  • Clinical Trials (138)
  • Developmental Biology (6617)
  • Ecology (10224)
  • Epidemiology (2065)
  • Evolutionary Biology (13639)
  • Genetics (9557)
  • Genomics (12856)
  • Immunology (7930)
  • Microbiology (19568)
  • Molecular Biology (7675)
  • Neuroscience (42182)
  • Paleontology (308)
  • Pathology (1259)
  • Pharmacology and Toxicology (2208)
  • Physiology (3271)
  • Plant Biology (7058)
  • Scientific Communication and Education (1295)
  • Synthetic Biology (1953)
  • Systems Biology (5433)
  • Zoology (1119)