Abstract
The costs of maintaining exabytes of data produced by sequencing experiments every year has become a major issue in today’s genomics. In spite of the increasing popularity of the third generation sequencing, the existing algorithms for compressing long reads exhibit minor advantage over general purpose gzip. We present CoLoRd, an algorithm able to reduce 3rd generation sequencing data by an order of magnitude without affecting the accuracy of downstream analyzes.
Competing Interest Statement
Heng Li is a consultant of Integrated DNA Technologies, Inc and on the Scientific Advisory Boards of Sentieon, Inc and Innozeen Inc.
Footnotes
↵∗ The parameters differ across priority modes (memory/balanced /ratio) and technologies (0NT/HiFi/CLR). For simplicity, we give the values in a default priority mode (memory) for 0NT and, optionally, HiFi, when it differs significantly.