TY - JOUR T1 - XGMix: Local-Ancestry Inference with Stacked XGBoost JF - bioRxiv DO - 10.1101/2020.04.21.053876 SP - 2020.04.21.053876 AU - Arvind Kumar AU - Daniel Mas Montserrat AU - Carlos Bustamante AU - Alexander Ioannidis Y1 - 2020/01/01 UR - http://biorxiv.org/content/early/2020/04/24/2020.04.21.053876.abstract N2 - Genomic medicine promises increased resolution for accurate diagnosis, for personalized treatment, and for identification of population-wide health burdens at rapidly decreasing cost (with a genotype now cheaper than an MRI and dropping). The benefits of this emerging form of affordable, data-driven medicine will accrue predominantly to those populations whose genetic associations have been mapped, so it is of increasing concern that over 80% of such genome-wide association studies (GWAS) have been conducted solely within individuals of European ancestry [1]. The severe under-representation of the majority of the world’s populations in genetic association studies stems in part from an addressable algorithmic weakness: lack of simple, accurate, and easily trained methods for identifying and annotating ancestry along the genome (local ancestry). Here we present such a method (XGMix) based on gradient boosted trees, which, while being accurate, is also simple to use, and fast to train, taking minutes on consumer-level laptops.Competing Interest StatementC.B. is a member of the scientific advisory boards for Liberty Biosecurity, Personalis, 23andMe Roots into the Future, Ancestry.com, IdentifyGenomics, Genomelink, and Etalon and is a founder of CDB Consulting. ER -