The Genomedata format for storing large-scale functional genomics data

Bioinformatics. 2010 Jun 1;26(11):1458-9. doi: 10.1093/bioinformatics/btq164. Epub 2010 Apr 29.

Abstract

Summary: We present a format for efficient storage of multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space footprint. We have also developed utilities to load data into this format. We show that retrieving data from this format is more than 2900 times faster than a naive approach using wiggle files.

Availability and implementation: Reference implementation in Python and C components available at http://noble.gs.washington.edu/proj/genomedata/ under the GNU General Public License.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Databases, Genetic
  • Genome*
  • Genomics / methods*
  • Software