Abstract
Motivation Genome assembly is increasingly performed on long, uncorrected reads. Assembly quality may be degraded due to unfiltered chimeric reads; also, the storage of all read overlaps can take up to terabytes of disk space.
Results We introduce two tools, yacrd and fpa, to respectively perform chimera removal/read scrubbing, and filter out spurious overlaps. We show that yacrd results in higher-quality assemblies and is two orders of magnitude faster than the best available alternative.
Availability https://github.com/natir/yacrd and https://github.com/natir/fpa
Contact pierre.marijon{at}inria.fr
Footnotes
Better formatting of the bibliography in the appendices
https://gitlab.inria.fr/pmarijon/yacrd-and-fpa-upstream-tools-for-lr-genome-assembly
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.