RT Journal Article SR Electronic T1 Boiler: Lossy compression of RNA-seq alignments using coverage vectors JF bioRxiv FD Cold Spring Harbor Laboratory SP 040634 DO 10.1101/040634 A1 Jacob Pritt A1 Ben Langmead YR 2016 UL http://biorxiv.org/content/early/2016/02/22/040634.abstract AB We describe Boiler, a new software tool for compressing and querying large collections of RNA-seq alignments. Boiler discards most per-read data, keeping only a genomic coverage vector plus a few empirical distributions summarizing the alignments. Since most per-read data is discarded, storage footprint is often much smaller than that achieved by other compression tools. Despite this, the most relevant per-read data can be recovered; we show that Boiler compression has only a slight negative impact on results given by downstream tools for isoform assembly and quantitation. Boiler also allows the user to pose fast and useful related queries without decompressing the entire file. Boiler is free open source software available from github.com/jpritt/boiler.