RT Journal Article SR Electronic T1 Exploration of the Timeliness of ICD Codes in Administrative Databases: A Nationwide Study JF bioRxiv FD Cold Spring Harbor Laboratory SP 491951 DO 10.1101/491951 A1 Marie Simon A1 Bastien Rance A1 Sandrine Katsahian A1 Karim Bounebache A1 Grégoire Rey A1 Gilles Chatellier A1 Antoine Neuraz A1 Anita Burgun A1 Vincent Looten YR 2018 UL http://biorxiv.org/content/early/2018/12/10/491951.abstract AB INTRODUCTION The ICD codes are ubiquitously available in hospital information systems and have been used in a number of areas such as epidemiology, phenotype-genotype association mining, surveillance of the use of drugs and medical devices and health care evaluation. We aimed to analyze the timeliness of the 3-character ICD-10 codes collected in the French national hospital discharge summary database between 2008 and 2017 and we classified the codes according to their evolution.MATERIAL AND METHODS We extracted all 3-character ICD-10 codes from all French hospital discharge summaries between 2008 and 2017. For each code and by the month, we computed a relative frequency; we also computed the overall amplitude of the study period. Temporal clustering, according to the SAX representation, was performed to classify the main evolution patterns.RESULTS We extracted 238,334,751 encounters corresponding to 56,621,773 distinct patients. 1,006 ICD codes presented a variation of the relative amplitude of frequencies lower than 50%, 510 codes between 50% and 100% and 521 greater than 100%. Out of the 2,037 codes included in the study, we kept the 1,758 for the temporal clustering. Four clusters were identified, including a global increase and a global decrease patterns.DISCUSSION The overall results showed a strong instability (i.e. large variation of frequency over time) of the use of ICD codes over time, with an important variation of the relative amplitude of the frequencies. We distinguished between external factors due to changes in billing, organization, policy or regulation and intrinsic factors due to epidemiological phenomenon. The detailed analysis of profiles show that the same cluster can contain profiles influenced by intrinsic epidemiological or external factors or both. Additional knowledge and sources are probably required to determine automatically the origin of the profile.ABBREVIATIONSDQData QualityEHRElectronic Health RecordsICDInternational Statistical Classification of Diseases and Related Health ProblemsOHDSIObservational Health Data Sciences and InformaticsSAXSymbolic Aggregate Approximation representationSNDSSystème National des Données de Santé (National Health Data System)