Abstract
Advancements in DNA sequencing technologies rapidly change the landscape of modern biology. The novel next-generation sequencing (NGS) applications often have special requirements regarding experimental data processing. Software tools developed and used for novel applications are generally designed for specific use cases and as such may be difficult to adapt to new uses. Simultaneously, software tools designed to be general are often difficult to adapt to special use cases.
Here, we present nsearch, a modern open source C++11 library and command-line tool for biological sequence data processing. nsearch offers commonly used components for handling of biological sequences including paired-end read merging, quality filtering and sequence similarity searching. nsearch can either be embedded natively into other C++ applications or be packaged as a standalone executable.
Functionality and performance of nsearch is shown using benchmark data created using the Rfam 13 database. Benchmarking against common general purpose tools USEARCH and VSEARCH demonstrates that nsearch delivers performance comparable these state-of-the-art tools.
nsearch is available on GitHub under the permissive BSD-3-clause license: https://github.com/stevschmid/nsearch