Compression of DNA sequence reads in FASTQ format

Sebastian Deorowicz; Szymon Grabowski

doi:10.1093/bioinformatics/btr014

Compression of DNA sequence reads in FASTQ format

Bioinformatics. 2011 Mar 15;27(6):860-2. doi: 10.1093/bioinformatics/btr014. Epub 2011 Jan 19.

Authors

Sebastian Deorowicz¹, Szymon Grabowski

Affiliation

¹ Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland. sebastian.deorowicz@polsl.pl

PMID: 21252073
DOI: 10.1093/bioinformatics/btr014

Abstract

Motivation: Modern sequencing instruments are able to generate at least hundreds of millions short reads of genomic data. Those huge volumes of data require effective means to store them, provide quick access to any record and enable fast decompression.

Results: We present a specialized compression algorithm for genomic data in FASTQ format which dominates its competitor, G-SQZ, as is shown on a number of datasets from the 1000 Genomes Project (www.1000genomes.org).

Availability: DSRC is freely available at http:/sun.aei.polsl.pl/dsrc.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Base Sequence
Computational Biology / methods
Data Compression / methods*
Genomics
Internet
Sequence Analysis, DNA*