RT Journal Article SR Electronic T1 Controlling the SARS-CoV-2 outbreak, insights from large scale whole genome sequences generated across the world JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.04.28.066977 DO 10.1101/2020.04.28.066977 A1 Jody Phelan A1 Wouter Deelder A1 Daniel Ward A1 Susana Campino A1 Martin L. Hibberd A1 Taane G Clark YR 2020 UL http://biorxiv.org/content/early/2020/04/29/2020.04.28.066977.abstract AB Background SARS-CoV-2 most likely evolved from a bat beta-coronavirus and started infecting humans in December 2019. Since then it has rapidly infected people around the world, with more than 3 million confirmed cases by the end of April 2020. Early genome sequencing of the virus has enabled the development of molecular diagnostics and the commencement of therapy and vaccine development. The analysis of the early sequences showed relatively few evolutionary selection pressures. However, with the rapid worldwide expansion into diverse human populations, significant genetic variations are becoming increasingly likely. The current limitations on social movement between countries also offers the opportunity for these viral variants to become distinct strains with potential implications for diagnostics, therapies and vaccines.Methods We used the current sequencing archives (NCBI and GISAID) to investigate 5,349 whole genomes, looking for evidence of strain diversification and selective pressure.Results We used 3,958 SNPs to build a phylogenetic tree of SARS-CoV-2 diversity and noted strong evidence for the existence of two major clades and six sub-clades, unevenly distributed across the world. We also noted that convergent evolution has potentially occurred across several locations in the genome, showing selection pressures, including on the spike glycoprotein where we noted a potentially critical mutation that could affect its binding to the ACE2 receptor. We also report on mutations that could prevent current molecular diagnostics from detecting some of the sub-clades.Conclusions The worldwide whole genome sequencing effort is revealing the challenge of developing SARS-CoV-2 containment tools suitable for everyone and the need for data to be continually evaluated to ensure accuracy in outbreak estimations.Competing Interest StatementThe authors have declared no competing interest.