RT Journal Article SR Electronic T1 Sharing genetic admixture and diversity of public biomedical datasets JF bioRxiv FD Cold Spring Harbor Laboratory SP 210716 DO 10.1101/210716 A1 Olivier Harismendy A1 Jihoon Kim A1 Xiaojun Xu A1 Lucila Ohno-Machado YR 2017 UL http://biorxiv.org/content/early/2017/10/28/210716.abstract AB Genetic ancestry and admixture are critical co-factors to study phenotype-genotype associations using cohorts of human subjects. Most publically available molecular datasets – genomes, exomes or transcriptomes - are however missing this information or only share self-reported ancestry. This represents a limitation to identify and re-purpose datasets to investigate the contribution of race and ethnicity to diseases and traits. we propose an analytical framework to enrich the meta-data from publically available cohorts with admixture information and a resulting diversity score at continental resolution, calculated directly from the data. We illustrate the utility and versatility of the framework using The Cancer Genome Atlas datasets indexed and searched through the DataMed Data Discovery Index. Data repositories or data contributors can use this framework to provide, as metadata, admixture for controlled access datasets, minimizing the work involved in requesting a dataset that may ultimately prove inadequate for a researcher’s purpose. With the increasingly global scale of human genetics research, research on disease risk and susceptibility would benefit greatly from the adequate estimation and sharing of admixture data following a framework such as the one presented.