TY - JOUR T1 - Global Geographic and Temporal Analysis of SARS-CoV-2 Haplotypes Normalized by COVID-19 Cases during the First Seven Months of the Pandemic JF - bioRxiv DO - 10.1101/2020.07.12.199414 SP - 2020.07.12.199414 AU - Santiago Justo Arévalo AU - Daniela Zapata Sifuentes AU - César Huallpa Robles AU - Gianfranco Landa Bianchi AU - Adriana Castillo Chávez AU - Romina Garavito-Salini Casas AU - Guillermo Uceda-Campos AU - Roberto Pineda Chavarría Y1 - 2020/01/01 UR - http://biorxiv.org/content/early/2020/09/01/2020.07.12.199414.abstract N2 - Since the identification of SARS-CoV-2, a large number of genomes have been sequenced with unprecedented speed around the world. This marks a unique opportunity to analyze virus spreading and evolution in a worldwide context. However, currently, there is not a useful haplotype description to help to track important and globally scattered mutations. Also, differences in the number of sequenced genomes between countries and/or months make it difficult to identify the emergence of haplotypes in regions where few genomes are sequenced but a large number of cases are reported. We proposed an approach based on the normalization by COVID-19 cases of relative frequencies of mutations using all the available data to identify major haplotypes. Thus, we can use a similar normalization approach to tracking the global temporal and geographic haplotypes distribution in the world. Using 48 776 genomes, we identify 5 major haplotypes based on 9 high-frequency mutations. Normalized global geographic and temporal analysis is presented here highlighting the current importance of nucleocapsid mutations (R203K, G204R) above the highly discussed D614G in spike protein. Also, we analyzed age, gender, and patient status distribution by haplotypes, but scarce and not well-organized information about this is publicly available. For that, we create a web-service to continuously update our normalized analysis of mutations and haplotypes, and to allow researchers to voluntarily share patient status information in a well-organized manner to improve analyses and making possible monitor the emergence of mutations and/or haplotypes with patients preferences or different pathogenic features. Finally, we discuss currently structural and functional hypotheses in the most frequently identified mutations.Competing Interest StatementThe authors have declared no competing interest. ER -