Abstract
Responsible for the metabolism of 25% of all drugs, CYP2D6 is a critical component of personalized medicine initiatives. Genotyping CYP2D6 is challenging due to sequence similarity with its pseudogene paralog CYP2D7 and a high number and variety of common structural variants (SVs). Here we describe a novel bioinformatics method, Cyrius, that accurately genotypes CYP2D6 using whole-genome sequencing (WGS) data. Using a validation data set consisting of reference samples with diverse genotypes as well as PacBio long read data, we show that Cyrius has superior performance (96.5% concordance with truth genotypes) compared to existing methods (83.8-86.6%). After implementing the improvements identified from the comparison against the truth data, Cyrius’s accuracy has since been improved to 99.3%. Using Cyrius, we built a haplotype frequency database from 2504 ethnically diverse samples and estimate that SV-containing star alleles are more frequent than previously reported. Cyrius will be a useful tool for pharmacogenomics applications with WGS and help bring the promise of precision medicine one step closer to reality.
Competing Interest Statement
XC, FS, NG, AM, CR, RJT, DRB and MAE are employees of Illumina Inc.