PT - JOURNAL ARTICLE AU - Lili Chan AU - Kelly Beers AU - Kinsuk Chauhan AU - Neha Debnath AU - Aparna Saha AU - Pattharawin Pattharanitima AU - Judy Cho AU - Peter Kotanko AU - Alex Federman AU - Steven Coca AU - Tielman Van Vleck AU - Girish N. Nadkarni TI - Comparison of Approaches to the identification of Symptom Burden in Hemodialysis Patients Utilizing Electronic Health Records AID - 10.1101/458976 DP - 2018 Jan 01 TA - bioRxiv PG - 458976 4099 - http://biorxiv.org/content/early/2018/11/02/458976.short 4100 - http://biorxiv.org/content/early/2018/11/02/458976.full AB - Background Identification of symptoms is challenging with surveys, which are time-intensive and low-throughput. Natural language processing (NLP) could be utilized to identify symptoms from narrative documentation in the electronic health record (EHR).Methods We utilized NLP to parse notes for maintenance hemodialysis (HD) patients from two EHR databases (BioMe and MIMIC-III) to identify fatigue, nausea/vomiting, anxiety, depression, cramping, itching, and pain. We compared NLP performance with International Classification of Diseases (ICD) codes and validated the performance of both NLP and codes against manual chart review in a representative subset.Results We identified 1034 and 929 HD patients from BioMe and MIMIC-III respectively. The most frequently identified symptoms by NLP from both cohorts were fatigue, pain, and nausea and/or vomiting. NLP was significantly more sensitive than ICD codes for nearly all symptoms. In the BioMe dataset, sensitivity for NLP ranged from 0.85-0.99 vs. 0.09-0.59 for ICD codes. In the MIMIC-III dataset, NLP sensitivity was 0.8-0.98 vs. 0.02-0.53 for ICD. ICD codes were significantly more specific for nausea and/or vomiting (NLP 0.57 vs. ICD 0.97, P=0.03) in BioMe and for depression (NLP 0.67 vs. ICD 0.99, P=0.002) in MIMIC-III. A majority of patients in both cohorts had ?4 symptoms. The more encounters available for a patient the more likely NLP was to identify a symptom.Conclusions NLP out performed ICD codes for identification of symptoms on several tests parameters including sensitivity for a majority of symptoms. NLP may be useful for the high-throughput identification of patient centered outcomes from EHR.Significance Statement Patients on maintenance hemodialysis experience a high frequency of symptoms. However, symptoms have been measured utilizing time-intensive surveys. This paper compares natural language processing (NLP) to administrative codes for the identification of seven key symptoms from two cohorts with electronic health records and validation through manual chart review. NLP identified high rates of symptoms; the most common were fatigue, pain, and nausea and/or vomiting. A majority of patients had ≥4 symptoms. NLP was significantly more sensitive at identifying symptoms compared to administrative codes for nearly all symptoms but specificity was not significantly different compared to codes. This paper demonstrates utility of a high throughput method of identifying symptoms from EHR which may advance the field of patient centered research in nephrology.