RT Journal Article SR Electronic T1 A fast machine-learning-guided primer design pipeline for selective whole genome amplification JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.04.27.489632 DO 10.1101/2022.04.27.489632 A1 Jane A. Yu A1 Zachary J. Oppler A1 Matthew W. Mitchell A1 Yun S. Song A1 Dustin Brisson YR 2022 UL http://biorxiv.org/content/early/2022/04/28/2022.04.27.489632.abstract AB Addressing many of the major outstanding questions in the fields of microbial evolution and pathogenesis will require analyses of populations of microbial genomes. Although population genomic studies provide the analytical resolution to investigate evolutionary and mechanistic processes at fine spatial and temporal scales – precisely the scales at which these processes occur – microbial population genomic research is currently hindered by the practicalities of obtaining sufficient quantities of the relatively pure microbial genomic DNA necessary for next-generation sequencing. Here we present swga2.0, an optimized and parallelized pipeline to design selective whole genome amplification (SWGA) primer sets. Unlike previous methods, swga2.0 incorporates active and machine learning methods to evaluate the amplification efficacy of individual primers and primer sets. Additionally, swga2.0 optimizes primer set search and evaluate strategies, including parallelization at each stage of the pipeline, to dramatically decrease program runtime from weeks to minutes. Here we describe the swga2.0 pipeline, including the empirical data used to identify primer and primer set characteristics, that improve amplification performance. Additionally, we evaluated the novel swga2.0 pipeline by designing primers sets that successfully amplify Prevotella melaninogenica, an important component of the lung microbiome in cystic fibrosis patients, from samples dominated by human DNA.Competing Interest StatementThe authors have declared no competing interest.