Abstract
Human immunodeficiency virus 1 (HIV-1) genome sequencing is routinely done for drug resistance monitoring in hospitals worldwide. Subtyping these extensive datasets of HIV-1 sequences is a critical first step in molecular epidemiology and surveillance studies. The clinical relevance of HIV-1 subtypes is increasingly recognized. Several studies suggest subtype-related differences in disease progression, transmission route efficiency, immune evasion, and even therapeutic outcomes. HIV-1 subtyping is mainly done using web servers. These tools have limitations in scalability and potential noncompliance with data protection legislation. Thus, the aim of this work was to develop an efficient method for local and high-throughput HIV-1 subtyping. We designed SNAPPy: a snakemake pipeline for scalable HIV-1 subtyping by phylogenetic pairing. It contains several tasks of phylogenetic inference and BLAST queries, which can be executed sequentially or in parallel, taking advantage of multiple-core processing units. Although it was built for subtyping, SNAPPy is also useful to perform extensive HIV-1 alignments. This tool facilitates large-scale sequence-based HIV-1 research by providing a local, resource efficient and scalable alternative for HIV-1 subtyping. It is capable of analysing full-length genomes or partial HIV-1 genomic regions (GAG, POL, ENV) and recognizes more than 90 circulating recombinant forms. SNAPPy is freely available at: https://github.com/PMMAraujo/snappy.