PT - JOURNAL ARTICLE AU - Katharine S. Walter AU - Caroline Colijn AU - Ted Cohen AU - Barun Mathema AU - Qingyun Liu AU - Jolene Bowers AU - David M. Engelthaler AU - Apurva Narechania AU - Julio Croda AU - Jason R. Andrews TI - Genomic variant identification methods alter <em>Mycobacterium tuberculosis</em> transmission inference AID - 10.1101/733642 DP - 2019 Jan 01 TA - bioRxiv PG - 733642 4099 - http://biorxiv.org/content/early/2019/08/24/733642.short 4100 - http://biorxiv.org/content/early/2019/08/24/733642.full AB - Pathogen genomic data are increasingly used to characterize global and local transmission patterns of important human pathogens and to inform public health interventions. Yet there is no current consensus on how to measure genomic variation. We investigated the effects of variant identification approaches on transmission inferences for M. tuberculosis by comparing variants identified by five different groups in the same sequence data from a clonal outbreak. We then measured the performance of commonly used variant calling approaches in recovering variation in a simulated tuberculosis outbreak and tested the effect of applying increasingly stringent filters on transmission inferences and phylogenies. We found that variant calling approaches used by different groups do not recover consistent sets of variants, often leading to conflicting transmission inferences. Further, performance in recovering true outbreak variation varied widely across approaches. Finally, stringent filters rapidly eroded the accuracy of transmission inferences and quality of phylogenies reconstructed from outbreak variation. We conclude that measurements of genetic distance and phylogenetic structure are dependent on variant calling approach. Variant calling algorithms trained upon true sequence data outperform other approaches and enable inclusion of repetitive regions typically excluded from genomic epidemiology studies, maximizing the information gleaned from outbreak genomes.