PT - JOURNAL ARTICLE AU - Łukasz Roguski AU - Idoia Ochoa AU - Mikel Hernaez AU - Sebastian Deorowicz TI - FaStore – a space-saving solution for raw sequencing data AID - 10.1101/168096 DP - 2017 Jan 01 TA - bioRxiv PG - 168096 4099 - http://biorxiv.org/content/early/2017/07/25/168096.short 4100 - http://biorxiv.org/content/early/2017/07/25/168096.full AB - The affordability of DNA sequencing has led to the generation of unprecedented volumes of raw sequencing data. These data must be stored, processed, and transmitted, which poses significant challenges. To facilitate this effort, we introduce FaStore, a specialized compressor for FASTQ files. The proposed algorithm does not use any reference sequences for compression, and permits the user to choose from several lossy modes to improve the overall compression ratio, depending on the specific needs. We demonstrate through extensive simulations that FaStore achieves a significant improvement in compression ratio with respect to previously proposed algorithms for this task. In addition, we perform an analysis on the effect that the different lossy modes have on variant calling, the most widely used application for clinical decision making, especially important in the era of precision medicine. We show that lossy compression can offer significant compression gains, while preserving the essential genomic information and without affecting the variant calling performance.