RT Journal Article SR Electronic T1 Large scale automated phylogenomical analysis of bacterial whole-genome isolates and the Evergreen platform JF bioRxiv FD Cold Spring Harbor Laboratory SP 540138 DO 10.1101/540138 A1 Judit Szarvas A1 Johanne Ahrenfeldt A1 Jose Luis Bellod Cisneros A1 Martin Christen Frølund Thomsen A1 Frank M. Aarestrup A1 Ole Lund YR 2019 UL http://biorxiv.org/content/early/2019/02/05/540138.abstract AB Public health authorities whole-genome sequence thousands of pathogenic isolates each month for microbial diagnostics and surveillance of pathogenic bacteria. The computational methods have not kept up with the deluge of data and need for real-time results.We have therefore created a bioinformatics pipeline for rapid subtyping and continuous phylogenomic analysis of bacterial samples, suited for large-scale surveillance. To decrease the computational burden, a two level clustering strategy is employed. The data is first divided into sets by matching each isolate to a closely related reference genome. The reads then are aligned to the reference to gain a consensus sequence and SNP based genetic distance is calculated between the sequences in each set. Isolates are clustered together with a threshold of 10 SNPs. Finally, phylogenetic trees are inferred from the non-redundant sequences and the clustered isolates are placed on a clade with the cluster representative sequence. The method was benchmarked and found to be accurate in grouping outbreak strains together, while discriminating from non-outbreak strains.The pipeline was applied in Evergreen Online, which processes publicly available sequencing data from foodborne bacterial pathogens on a daily basis, updating the phylogenetic trees as needed. It has so far placed more than 100,000 isolates into phylogenies, and has been able to keep up with the daily release of data. The trees are continuously published on https://cge.cbs.dtu.dk/services/Evergreen