PT - JOURNAL ARTICLE AU - Marie Simon AU - Bastien Rance AU - Sandrine Katsahian AU - Karim Bounebache AU - Grégoire Rey AU - Gilles Chatellier AU - Antoine Neuraz AU - Anita Burgun AU - Vincent Looten TI - Exploration of the Timeliness of ICD Codes in Administrative Databases: A Nationwide Study AID - 10.1101/491951 DP - 2018 Jan 01 TA - bioRxiv PG - 491951 4099 - http://biorxiv.org/content/early/2018/12/10/491951.short 4100 - http://biorxiv.org/content/early/2018/12/10/491951.full AB - INTRODUCTION The ICD codes are ubiquitously available in hospital information systems and have been used in a number of areas such as epidemiology, phenotype-genotype association mining, surveillance of the use of drugs and medical devices and health care evaluation. We aimed to analyze the timeliness of the 3-character ICD-10 codes collected in the French national hospital discharge summary database between 2008 and 2017 and we classified the codes according to their evolution.MATERIAL AND METHODS We extracted all 3-character ICD-10 codes from all French hospital discharge summaries between 2008 and 2017. For each code and by the month, we computed a relative frequency; we also computed the overall amplitude of the study period. Temporal clustering, according to the SAX representation, was performed to classify the main evolution patterns.RESULTS We extracted 238,334,751 encounters corresponding to 56,621,773 distinct patients. 1,006 ICD codes presented a variation of the relative amplitude of frequencies lower than 50%, 510 codes between 50% and 100% and 521 greater than 100%. Out of the 2,037 codes included in the study, we kept the 1,758 for the temporal clustering. Four clusters were identified, including a global increase and a global decrease patterns.DISCUSSION The overall results showed a strong instability (i.e. large variation of frequency over time) of the use of ICD codes over time, with an important variation of the relative amplitude of the frequencies. We distinguished between external factors due to changes in billing, organization, policy or regulation and intrinsic factors due to epidemiological phenomenon. The detailed analysis of profiles show that the same cluster can contain profiles influenced by intrinsic epidemiological or external factors or both. Additional knowledge and sources are probably required to determine automatically the origin of the profile.ABBREVIATIONSDQData QualityEHRElectronic Health RecordsICDInternational Statistical Classification of Diseases and Related Health ProblemsOHDSIObservational Health Data Sciences and InformaticsSAXSymbolic Aggregate Approximation representationSNDSSystème National des Données de Santé (National Health Data System)