RT Journal Article
SR Electronic
T1 Large scale automated phylogenomical analysis of bacterial whole-genome isolates and the Evergreen platform
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 540138
DO 10.1101/540138
A1 Judit Szarvas
A1 Johanne Ahrenfeldt
A1 Jose Luis Bellod Cisneros
A1 Martin Christen Frølund Thomsen
A1 Frank M. Aarestrup
A1 Ole Lund
YR 2019
UL http://biorxiv.org/content/early/2019/02/05/540138.abstract
AB Public health authorities whole-genome sequence thousands of pathogenic isolates each month for microbial diagnostics and surveillance of pathogenic bacteria. The computational methods have not kept up with the deluge of data and need for real-time results.We have therefore created a bioinformatics pipeline for rapid subtyping and continuous phylogenomic analysis of bacterial samples, suited for large-scale surveillance. To decrease the computational burden, a two level clustering strategy is employed. The data is first divided into sets by matching each isolate to a closely related reference genome. The reads then are aligned to the reference to gain a consensus sequence and SNP based genetic distance is calculated between the sequences in each set. Isolates are clustered together with a threshold of 10 SNPs. Finally, phylogenetic trees are inferred from the non-redundant sequences and the clustered isolates are placed on a clade with the cluster representative sequence. The method was benchmarked and found to be accurate in grouping outbreak strains together, while discriminating from non-outbreak strains.The pipeline was applied in Evergreen Online, which processes publicly available sequencing data from foodborne bacterial pathogens on a daily basis, updating the phylogenetic trees as needed. It has so far placed more than 100,000 isolates into phylogenies, and has been able to keep up with the daily release of data. The trees are continuously published on https://cge.cbs.dtu.dk/services/Evergreen