RT Journal Article SR Electronic T1 Comparison Of Multi-locus Sequence Typing software For next generation sequencing data JF bioRxiv FD Cold Spring Harbor Laboratory SP 117770 DO 10.1101/117770 A1 Andrew J. Page A1 Nabil-Fareed Alikhan A1 Heather A. Carleton A1 Torsten Seemann A1 Jacqueline A. Keane A1 Lee S. Katz YR 2017 UL http://biorxiv.org/content/early/2017/03/17/117770.abstract AB Multi-locus sequence typing (MLST) is a widely used method for categorising bacteria. Increasingly MLST is being performed using next generation sequencing data by reference labs and for clinical diagnostics. Many software applications have been developed to calculate sequence types from NGS data; however, there has been no comprehensive review to date on these methods. We have compared six of these applications against real and simulated data and present results on: 1. the accuracy of each method against traditional typing methods, 2. the performance on real outbreak datasets, 3. in the impact of contamination and varying depth of coverage, and 4. the computational resource requirements.DATA SUMMARYSimulated reads for datasets testing coverage and mixed samples have been deposited in Figshare; DOI: https://doi.org/10.6084/m9.figshare.4602301.vlOutbreak databases are available from Github; url - https://github.com/WGS-standards-and-analysis/datasetsDocker containers used to run each of the applications are available from Github; url – https://tinyurl.com/z7ks2ftAccession numbers for the data used in this paper are available in the Supplementary material.DATA SUMMARYWe confirm all supporting data, code and protocols have been provided within the article or through supplementary data files. ☒IMPACT STATEMENT Sequence typing is rapidly transitioning from traditional sequencing methods to using whole genome sequencing. A number of in silico prediction methods have been developed on an ad hoc basis and aim to replicate Multi-locus sequence typing (MLST). This is the first study to comprehensively evaluate multiple MLST software applications on real validated datasets and on common simulated difficult cases. It will give researchers a clearer understanding of the accuracy, limitations and computational performance of the methods they use, and will assist future researchers to choose the most appropriate method for their experimental goals.MLSTMulti-locus sequence typingNGSNext generation sequencingPFGEPulsed-field gel electrophoresis