Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

The PDF Data Extractor (PDE) Pre-screening Tool Reduced the Manual Review Burden for Systematic Literature Reviews by Over 35% Through Automated High-Throughput Assessment of Full-Text Articles

View ORCID ProfileErik Stricker, Michael E. Scheurer
doi: https://doi.org/10.1101/2021.07.13.452159
Erik Stricker
1Baylor College of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Erik Stricker
  • For correspondence: stricker@bcm.edu scheurer@bcm.edu
Michael E. Scheurer
1Baylor College of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: stricker@bcm.edu scheurer@bcm.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Literature reviews are generally time-consuming and rely heavily on accurate representation of the data in the title and abstract of articles. Often minor results and details are lost in a systematic screen, which is becoming even more frequent with the rapidly rising numbers of daily published scientific articles. We developed the PDF Data Extractor (PDE) R package to aid scientists at any stage in literature reviews while offering a user-friendly interface. The tool permits the user to categorize large numbers of full-text articles in PDF format, export containing tables to Excel sheets (pdf2table), and extract relevant data using a simple user interface, requiring no bioinformatics skills. Specific features of the literature analysis comprise the adaptability of analysis parameters including the use of regular expressions, machine learning-powered detection of abbreviations of search words in articles, and the export of document meta-data. We exemplify how the PDE R package can be utilized as a pre-screening tool allowing automated categorization of full-text articles by relevance, thereby reducing the literature to be evaluated (in our example by 35% with a sensitivity of 100% at standard parameters). The PDE R package is available from the Comprehensive R Archive Network at https://CRAN.R-project.org/package=PDE and as web tool with limited capacity at https://erikstricker.shinyapps.io/PDE_analyzer/.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • The complete manuscript was revised from a manual style to a primary research article outline. Results section on the implementation of PDE R package as pre-screening tool, figure 1, figure 2, and figure 3 were updated with numbers from analysis with the latest version of the package. Discussion section was expanded with a comparison of different other systematic review software.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted March 13, 2023.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
The PDF Data Extractor (PDE) Pre-screening Tool Reduced the Manual Review Burden for Systematic Literature Reviews by Over 35% Through Automated High-Throughput Assessment of Full-Text Articles
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
The PDF Data Extractor (PDE) Pre-screening Tool Reduced the Manual Review Burden for Systematic Literature Reviews by Over 35% Through Automated High-Throughput Assessment of Full-Text Articles
Erik Stricker, Michael E. Scheurer
bioRxiv 2021.07.13.452159; doi: https://doi.org/10.1101/2021.07.13.452159
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
The PDF Data Extractor (PDE) Pre-screening Tool Reduced the Manual Review Burden for Systematic Literature Reviews by Over 35% Through Automated High-Throughput Assessment of Full-Text Articles
Erik Stricker, Michael E. Scheurer
bioRxiv 2021.07.13.452159; doi: https://doi.org/10.1101/2021.07.13.452159

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Scientific Communication and Education
Subject Areas
All Articles
  • Animal Behavior and Cognition (4246)
  • Biochemistry (9184)
  • Bioengineering (6808)
  • Bioinformatics (24072)
  • Biophysics (12167)
  • Cancer Biology (9570)
  • Cell Biology (13847)
  • Clinical Trials (138)
  • Developmental Biology (7666)
  • Ecology (11742)
  • Epidemiology (2066)
  • Evolutionary Biology (15548)
  • Genetics (10676)
  • Genomics (14372)
  • Immunology (9523)
  • Microbiology (22923)
  • Molecular Biology (9139)
  • Neuroscience (49175)
  • Paleontology (358)
  • Pathology (1488)
  • Pharmacology and Toxicology (2584)
  • Physiology (3851)
  • Plant Biology (8356)
  • Scientific Communication and Education (1473)
  • Synthetic Biology (2302)
  • Systems Biology (6207)
  • Zoology (1304)