RT Journal Article SR Electronic T1 A molecular barcode and online tool to identify and map imported infection with Plasmodium vivax JF bioRxiv FD Cold Spring Harbor Laboratory SP 776781 DO 10.1101/776781 A1 Trimarsanto, Hidayat A1 Amato, Roberto A1 Pearson, Richard D A1 Sutanto, Edwin A1 Noviyanti, Rintis A1 Trianty, Leily A1 Marfurt, Jutta A1 Pava, Zuleima A1 Echeverry, Diego F A1 Lopera-Mesa, Tatiana M A1 Montenegro, Lidia Madeline A1 Tobón-Castaño, Alberto A1 Grigg, Matthew J A1 Barber, Bridget A1 William, Timothy A1 Anstey, Nicholas M A1 Getachew, Sisay A1 Petros, Beyene A1 Aseffa, Abraham A1 Assefa, Ashenafi A1 Rahim, Awab Ghulam A1 Chau, Nguyen Hoang A1 Hien, Tran Tinh A1 Alam, Mohammad Shafiul A1 Khan, Wasif A A1 Ley, Benedikt A1 Thriemer, Kamala A1 Wangchuck, Sonam A1 Hamedi, Yaghoob A1 Adam, Ishag A1 Liu, Yaobao A1 Gao, Qi A1 Sriprawat, Kanlaya A1 Ferreira, Marcelo U A1 Barry, Alyssa A1 Mueller, Ivo A1 Drury, Eleanor A1 Goncalves, Sonia A1 Simpson, Victoria A1 Miotto, Olivo A1 Miles, Alistair A1 White, Nicholas J A1 Nosten, Francois A1 Kwiatkowski, Dominic P A1 Price, Ric N A1 Auburn, Sarah YR 2019 UL http://biorxiv.org/content/early/2019/09/24/776781.abstract AB Imported cases present a considerable challenge to the elimination of malaria. Traditionally, patient travel history has been used to identify imported cases, but the long-latency liver stages confound this approach in Plasmodium vivax. Molecular tools to identify and map imported cases offer a more robust approach, that can be combined with drug resistance and other surveillance markers in high-throughput, population-based genotyping frameworks. Using a machine learning approach incorporating hierarchical FST (HFST) and decision tree (DT) analysis applied to 831 P. vivax genomes from 20 countries, we identified a 28-Single Nucleotide Polymorphism (SNP) barcode with high capacity to predict the country of origin. The Matthews correlation coefficient (MCC), which provides a measure of the quality of the classifications, ranging from −1 (total disagreement) to 1 (perfect prediction), exceeded 0.9 in 15 countries in cross-validation evaluations. When combined with an existing 37-SNP P. vivax barcode, the 65-SNP panel exhibits MCC scores exceeding 0.9 in 17 countries with up to 30% missing data. As a secondary objective, several genes were identified with moderate MCC scores (median MCC range from 0.54-0.68), amenable as markers for rapid testing using low-throughput genotyping approaches. A likelihood-based classifier framework was established, that supports analysis of missing data and polyclonal infections. To facilitate investigator-lead analyses, the likelihood framework is provided as a web-based, open-access platform (vivaxGEN-geo) to support the analysis and interpretation of data produced either at the 28-SNP core or full 65-SNP barcode. These tools can be used by malaria control programs to identify the main reservoirs of infection so that resources can be focused to where they are needed most.