Population-specific genome graphs improve high-throughput sequencing data analysis: A case study on the Pan-African genome

H. Serhat Tetikol; Kubra Narci; Deniz Turgut; Gungor Budak; Ozem Kalay; Elif Arslan; Sinem Demirkaya-Budak; Alexey Dolgoborodov; Amit Jain; Duygu Kabakci-Zorlu; Richard Brown; Vladimir Semenyuk; Brandi Davis-Dusenbery

doi:10.1101/2021.03.19.436173

ABSTRACT

Graph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference for capturing the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. While there have been many efforts to develop computationally efficient graph-based bioinformatics toolkits, how to curate genomic variants and subsequently construct genome graphs remains an understudied problem that inevitably determines the effectiveness of the end-to-end bioinformatics pipeline. In this study, we discuss major obstacles encountered during graph construction and propose methods for sample selection based on population diversity, graph augmentation with structural variants and resolution of graph reference ambiguity caused by information overload. Moreover, we present the case for iteratively augmenting tailored genome graphs for targeted populations and test the proposed approach on the whole-genome samples of African ancestry. Our results show that, as more representative alternatives to linear or generic graph references, population-specific graphs can achieve significantly lower read mapping errors, increased variant calling sensitivity and provide the improvements of joint variant calling without the need of computationally intensive post-processing steps.

Competing Interest Statement

All authors have been employed by Seven Bridges Inc. throughout the period of work for this study.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.