Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Linked annotations: a middle ground for manual curation of biomedical databases and text corpora

Tatyana Goldberg, Shrikant Vinchurkar, Juan Miguel Cejuela, Lars Juhl Jensen, Burkhard Rost
doi: https://doi.org/10.1101/014274
Tatyana Goldberg
1Bioinformatics & Computational Biology, Department of Informatics, Technical University of Munich (TUM), 85748 Garching, Germany
2TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), 85748 Garching, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shrikant Vinchurkar
1Bioinformatics & Computational Biology, Department of Informatics, Technical University of Munich (TUM), 85748 Garching, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Juan Miguel Cejuela
1Bioinformatics & Computational Biology, Department of Informatics, Technical University of Munich (TUM), 85748 Garching, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lars Juhl Jensen
3Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: lars.juhl.jensen@cpr.ku.dk assistant@rostlab.org
Burkhard Rost
1Bioinformatics & Computational Biology, Department of Informatics, Technical University of Munich (TUM), 85748 Garching, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: lars.juhl.jensen@cpr.ku.dk assistant@rostlab.org
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Annotators of text corpora and biomedical databases carry out the same labor-intensive task to manually extract structured data from unstructured text. Tasks are needlessly repeated because text corpora are widely scattered. We envision that a linked annotation resource unifying many corpora could be a game changer. Such an open forum will help focus on novel annotations and on optimally benefiting from the energy of many experts. As proof-of-concept, we annotated protein subcellular localization in 100 abstracts cited by UniProtKB. The detailed comparison between our new corpus and the original UniProtKB annotations revealed sustained novel annotations for 42% of the entries (proteins). In a unified linked annotation resource these could immediately extend the utility of text corpora beyond the text-mining community. Our example motivates the central idea that linked annotations from text corpora can complement database annotations.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted January 23, 2015.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Linked annotations: a middle ground for manual curation of biomedical databases and text corpora
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Linked annotations: a middle ground for manual curation of biomedical databases and text corpora
Tatyana Goldberg, Shrikant Vinchurkar, Juan Miguel Cejuela, Lars Juhl Jensen, Burkhard Rost
bioRxiv 014274; doi: https://doi.org/10.1101/014274
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Linked annotations: a middle ground for manual curation of biomedical databases and text corpora
Tatyana Goldberg, Shrikant Vinchurkar, Juan Miguel Cejuela, Lars Juhl Jensen, Burkhard Rost
bioRxiv 014274; doi: https://doi.org/10.1101/014274

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4122)
  • Biochemistry (8831)
  • Bioengineering (6536)
  • Bioinformatics (23493)
  • Biophysics (11818)
  • Cancer Biology (9235)
  • Cell Biology (13350)
  • Clinical Trials (138)
  • Developmental Biology (7453)
  • Ecology (11431)
  • Epidemiology (2066)
  • Evolutionary Biology (15183)
  • Genetics (10458)
  • Genomics (14057)
  • Immunology (9193)
  • Microbiology (22222)
  • Molecular Biology (8833)
  • Neuroscience (47670)
  • Paleontology (352)
  • Pathology (1432)
  • Pharmacology and Toxicology (2493)
  • Physiology (3741)
  • Plant Biology (8098)
  • Scientific Communication and Education (1438)
  • Synthetic Biology (2226)
  • Systems Biology (6046)
  • Zoology (1258)