Abstract
Recent long-read assemblies often exceed the quality of available reference genomes, making validation challenging. Here we present Merqury, a novel tool for reference-free assembly evaluation based on efficient k-mer set operations. By comparing k-mers in a de novo assembly to those found in unassembled high-accuracy reads, Merqury estimates base-level accuracy and completeness. For trios, Merqury can also evaluate haplotype-specific accuracy, completeness, phase block continuity, and switch errors. Multiple visualizations, such as k-mer spectrum plots, are provided for evaluating assembly quality. We demonstrate on both human and plant genomes that Merqury is a fast and robust method for assembly validation.
Availability of data and material Project name: Merqury
Project home page: https://github.com/marbl/merqury, https://github.com/marbl/meryl
Archived version: DOI or unique identifier of archived software or code in repository (e.g. enodo) Operating system(s): Platform independent
Programming language: C++, Java, Perl
Other requirements: gcc 4.8 or higher, java 1.6 or higher
License: Public domain (see https://github.com/marbl/merqury/blob/master/README.license)
Any restrictions to use by non-academics: No restrictions applied