Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Escape Excel: a tool for preventing gene symbol and accession conversion errors

Eric A. Welsh, Paul A. Stewart, Brent M. Kuenzi
doi: https://doi.org/10.1101/103820
Eric A. Welsh
1Cancer Informatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, Florida 33612-9497, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: Eric.Welsh@moffitt.org
Paul A. Stewart
2Department of Thoracic Oncology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, Florida 33612-9497, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brent M. Kuenzi
3Department of Drug Discovery, H. Lee Moffitt Cancer Center & Research Institute, Tampa, Florida 33612-9497, United States
4Cancer Biology Ph.D. Program, University of South Florida, Tampa, Florida 33620, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Background Microsoft Excel automatically converts certain gene symbols, database accessions, and other alphanumeric text and numbers into dates, scientific notation, and other numerical representations, which may lead to subsequent, irreversible corruption of the imported text. A recent survey of popular genomic literature estimates that one-fifth of all papers with supplementary data containing gene lists in Excel format suffer from this issue.

Results Here, we present an open-source tool, Escape Excel, which prevents these erroneous conversions by generating an escaped text file that can be safely imported into Excel. Escape Excel is available in the Galaxy web environment and can be installed through the Galaxy ToolShed. Escape Excel is also available as a stand-alone, command line Perl script on GitHub (http://www.github.com/pstew/escape_excel). A Galaxy test server implementation is accessible at http://apostl.moffitt.org.

Conclusions Escape Excel detects and escapes a wide variety of problematic text strings so that they are not erroneously converted into other representations upon importation into Excel. Examples of problematic strings include date-like strings, time-like strings, leading zeroes in front of numbers, and long numeric and alpha-numeric identifiers that should not be automatically converted into scientific notation. It is hoped that greater awareness of these potential data-corruption issues, together with diligent escaping of text files prior to importation into Excel, will help to reduce the amount of Excel-corrupted data in scientific analyses and publications.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Back to top
PreviousNext
Posted January 27, 2017.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Escape Excel: a tool for preventing gene symbol and accession conversion errors
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Escape Excel: a tool for preventing gene symbol and accession conversion errors
Eric A. Welsh, Paul A. Stewart, Brent M. Kuenzi
bioRxiv 103820; doi: https://doi.org/10.1101/103820
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Escape Excel: a tool for preventing gene symbol and accession conversion errors
Eric A. Welsh, Paul A. Stewart, Brent M. Kuenzi
bioRxiv 103820; doi: https://doi.org/10.1101/103820

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (2416)
  • Biochemistry (4774)
  • Bioengineering (3319)
  • Bioinformatics (14626)
  • Biophysics (6617)
  • Cancer Biology (5156)
  • Cell Biology (7402)
  • Clinical Trials (138)
  • Developmental Biology (4340)
  • Ecology (6858)
  • Epidemiology (2057)
  • Evolutionary Biology (9876)
  • Genetics (7328)
  • Genomics (9496)
  • Immunology (4534)
  • Microbiology (12631)
  • Molecular Biology (4919)
  • Neuroscience (28206)
  • Paleontology (198)
  • Pathology (802)
  • Pharmacology and Toxicology (1380)
  • Physiology (2012)
  • Plant Biology (4473)
  • Scientific Communication and Education (974)
  • Synthetic Biology (1295)
  • Systems Biology (3903)
  • Zoology (722)