ABSTRACT
Objective Studies using Electronic Health Record data incorporate custom aggregations of billing codes. One such grouping is the phecode system, which was developed for phenome-wide association studies (PheWAS). Phecodes were built upon the International Classification of Diseases, version 9, Clinical Modification (ICD-9-CM). However, healthcare systems have transitioned to ICD-10/ICD-10-CM. Here we present our work on the development and validation of the mappings for ICD-10/ICD-10-CM to phecodes.
Materials and Methods We mapped ICD-10/ICD-10-CM codes to phecodes by matching code descriptions and using ICD-10/ICD-10-CM to SNOMED CT maps, general equivalence maps to ICD-9-CM, and a ICD-9-CM to phecode map. We assessed the coverage of the maps in two databases: Vanderbilt University Medical Center (VUMC) using ICD-10-CM and the UK Biobank (UKBB) using ICD-10. We evaluated the validity of the ICD-10-CM map by comparing phecode prevalence between ICD-9-CM and ICD-10-CM derived phecodes at VUMC and with a PheWAS.
Results We mapped >75% of ICD-10-CM and ICD-10 codes to phecodes. Of the unique codes observed in the VUMC (ICD-10-CM) and UKBB (ICD-10) cohorts, >90% were mapped to phecodes. Among the top ten phecodes in the cohorts, essential hypertension and hyperlipidemia overlapped. With regression analysis, we found that the ICD-10-CM map was comparable to the ICD-9-CM to phecode map. An initial PheWAS with a lipoprotein(a) (LPA) genetic variant using ICD-9-CM and ICD-10-CM maps yielded similar genotype-phenotype associations.
Conclusions This study introduces initial maps of ICD-10/ICD-10-CM codes to phecodes, which will enable researchers to leverage accumulated ICD-10/ICD-10-CM data for high-throughput clinical and genetic research.