PT - JOURNAL ARTICLE AU - Kirill Kryukov AU - Mahoko Takahashi Ueda AU - So Nakagawa AU - Tadashi Imanishi TI - Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences AID - 10.1101/501130 DP - 2019 Jan 01 TA - bioRxiv PG - 501130 4099 - http://biorxiv.org/content/early/2019/02/21/501130.short 4100 - http://biorxiv.org/content/early/2019/02/21/501130.full AB - Summary DNA sequence databases use compression such as gzip to reduce the required storage space and network transmission time. We describe Nucleotide Archival Format (NAF) – a new file format for lossless reference-free compression of FASTA and FASTQ-formatted nucleotide sequences. NAF compression ratio is comparable to the best DNA compressors, while providing dramatically faster decompression. We compared our format with DNA compressors: DELIMINATE and MFCompress, and with general purpose compressors: gzip, bzip2, xz, brotli, and zstd.Availability NAF compressor and decompressor, as well as format specification are available at https://github.com/KirillKryukov/naf. Format specification is in public domain. Compressor and decompressor are open source under the zlib/libpng license, free for nearly any use.Contact kkryukov{at}gmail.com