ABSTRACT
Precision oncology relies on the accurate discovery and interpretation of genomic variants to enable individualized diagnosis, prognosis, and therapy selection. We found that knowledgebases containing clinical interpretations of somatic cancer variants are highly disparate in interpretation content, structure, and supporting primary literature, impeding consensus when evaluating variants and their relevance in a clinical setting. With the cooperation of experts of the Global Alliance for Genomics and Health (GA4GH) and six prominent cancer variant knowledgebases, we developed a framework for aggregating and harmonizing variant interpretations to produce a meta-knowledgebase of 12,856 aggregate interpretations covering 3,437 unique variants in 415 genes, 357 diseases, and 791 drugs. We demonstrated large gains in overlap between resources across variants, diseases, and drugs as a result of this harmonization. We subsequently demonstrated improved matching between a patient cohort and harmonized interpretations of potential clinical significance, observing an increase from an average of 33% per individual knowledgebase to 56% in aggregate. Our analyses illuminate the need for open, interoperable sharing of variant interpretation data. We also provide an open and freely available web interface (search.cancervariants.org) for exploring the harmonized interpretations from these six knowledgebases.