PT - JOURNAL ARTICLE AU - Trimarsanto, Hidayat AU - Amato, Roberto AU - Pearson, Richard D AU - Sutanto, Edwin AU - Noviyanti, Rintis AU - Trianty, Leily AU - Marfurt, Jutta AU - Pava, Zuleima AU - Echeverry, Diego F AU - Lopera-Mesa, Tatiana M AU - Montenegro, Lidia Madeline AU - Tobón-Castaño, Alberto AU - Grigg, Matthew J AU - Barber, Bridget AU - William, Timothy AU - Anstey, Nicholas M AU - Getachew, Sisay AU - Petros, Beyene AU - Aseffa, Abraham AU - Assefa, Ashenafi AU - Rahim, Awab Ghulam AU - Chau, Nguyen Hoang AU - Hien, Tran Tinh AU - Alam, Mohammad Shafiul AU - Khan, Wasif A AU - Ley, Benedikt AU - Thriemer, Kamala AU - Wangchuck, Sonam AU - Hamedi, Yaghoob AU - Adam, Ishag AU - Liu, Yaobao AU - Gao, Qi AU - Sriprawat, Kanlaya AU - Ferreira, Marcelo U AU - Barry, Alyssa AU - Mueller, Ivo AU - Drury, Eleanor AU - Goncalves, Sonia AU - Simpson, Victoria AU - Miotto, Olivo AU - Miles, Alistair AU - White, Nicholas J AU - Nosten, Francois AU - Kwiatkowski, Dominic P AU - Price, Ric N AU - Auburn, Sarah TI - A molecular barcode and online tool to identify and map imported infection with <em>Plasmodium vivax</em> AID - 10.1101/776781 DP - 2019 Jan 01 TA - bioRxiv PG - 776781 4099 - http://biorxiv.org/content/early/2019/09/24/776781.short 4100 - http://biorxiv.org/content/early/2019/09/24/776781.full AB - Imported cases present a considerable challenge to the elimination of malaria. Traditionally, patient travel history has been used to identify imported cases, but the long-latency liver stages confound this approach in Plasmodium vivax. Molecular tools to identify and map imported cases offer a more robust approach, that can be combined with drug resistance and other surveillance markers in high-throughput, population-based genotyping frameworks. Using a machine learning approach incorporating hierarchical FST (HFST) and decision tree (DT) analysis applied to 831 P. vivax genomes from 20 countries, we identified a 28-Single Nucleotide Polymorphism (SNP) barcode with high capacity to predict the country of origin. The Matthews correlation coefficient (MCC), which provides a measure of the quality of the classifications, ranging from −1 (total disagreement) to 1 (perfect prediction), exceeded 0.9 in 15 countries in cross-validation evaluations. When combined with an existing 37-SNP P. vivax barcode, the 65-SNP panel exhibits MCC scores exceeding 0.9 in 17 countries with up to 30% missing data. As a secondary objective, several genes were identified with moderate MCC scores (median MCC range from 0.54-0.68), amenable as markers for rapid testing using low-throughput genotyping approaches. A likelihood-based classifier framework was established, that supports analysis of missing data and polyclonal infections. To facilitate investigator-lead analyses, the likelihood framework is provided as a web-based, open-access platform (vivaxGEN-geo) to support the analysis and interpretation of data produced either at the 28-SNP core or full 65-SNP barcode. These tools can be used by malaria control programs to identify the main reservoirs of infection so that resources can be focused to where they are needed most.