TY - JOUR T1 - Resistome SNP Calling via Read Colored de Bruijn Graphs JF - bioRxiv DO - 10.1101/156174 SP - 156174 AU - Bahar Alipanahi AU - Martin D. Muggli AU - Musa Jundi AU - Noelle Noyes AU - Christina Boucher Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/06/26/156174.abstract N2 - The microbiome and resistome, which refers to all the antimicrobial resistant (AMR) genes in pathogenic and non-pathogenic bacteria, are frequently studied using shotgun metagenomics data [13, 50]. Unfortunately, there are few methods capable of identifying single nucleotide polymorphisms (SNPs) in metagenomics data, and to the best of our knowledge, there are no methods that identify SNPs in AMR genes. Nonetheless, the identification of SNPs in AMR genes is an important problem since it allows these genes, which confer resistance to antibiotics, to be “fingerprinted” and tracked across multiple samples or time periods. In this paper, we present Vari, which allows SNPs to be identified in AMR genes from metagenomes data. LueVari is based on the read colored de Bruijn graph, an extension of the traditional de Bruijn graph that we present and formally define in this paper. We show that read coloring allows regions longer than the k-mer length and shorter than the read length to be identified unambiguously. In addition to this theoretical concept, we present a succinct data structure that allows for large datasets to be analyzed in a reasonable amount of time and space. Our experiments demonstrate LueVari was the only SNP caller that reliably reported sequences that spanned on average 47.5% of the AMR gene. Competing methods (GATK and SAMtools) only reported specific loci and require a reference to do so. This feature, along with the high accuracy of LueVari, allows distinct AMR genes to be detected reliably in a de novo fashion. ER -