TY - JOUR T1 - Augmented Intelligence with Natural Language Processing Applied to Electronic Health Records is Useful for Identifying Patients with Non-Alcoholic Fatty Liver Disease at Risk for Disease Progression JF - bioRxiv DO - 10.1101/518217 SP - 518217 AU - Tielman T. Van Vleck AU - Lili Chan AU - Steven G. Coca AU - Catherine K. Craven AU - Ron Do AU - Stephen B. Ellis AU - Joseph L. Kannry AU - Ruth J.F. Loos AU - Peter A. Bonis AU - Judy Cho AU - Girish N. Nadkarni Y1 - 2019/01/01 UR - http://biorxiv.org/content/early/2019/01/11/518217.abstract N2 - Objective Electronic health record (EHR) systems contain structured data and unstructured documentation. Clinical insights can be derived from analyzing both but optimal methods for this have not been studied extensively. We compared various approaches to analyzing EHR data for non-alcoholic fatty liver disease (NAFLD).Materials and Methods We compared analysis of structured and unstructured EHR data using natural language processing (NLP), free-text search, and diagnostic codes against expert adjudication as the reference standard.Results Out of 38,575 patients, we identified 2,281 patients with NAFLD. From the remainder, 10,653 patients with similar data density were selected as a control group. NLP was more sensitive than ICD and text search (NLP 0.93 vs. ICD 0.28 vs. text search 0.81) with higher a F2 score (NLP 0.92 vs. ICD 0.34 vs. text search 0.81). 619 patients had suspected NAFLD documented in radiology notes not acknowledged in other forms of clinical documentation. Of these, 232 (37.5%) were found to have more advanced liver disease after a median of 1,057 days.Discussion NLP-based approaches have superior accuracy in identifying NAFLD within the EHR compared to ICD/text search-based approaches. Suspected NAFLD on imaging is often not acknowledged in subsequent clinical documentation. Many such patients are later found to have more advanced liver disease.Conclusion For identification of NAFLD, NLP performed better than alternative selection modalities and facilitated follow-on analysis of information flow. If accuracy can be proven to persist across clinical domains, NLP can identify patient phenotypes for biomedical research in an accurate and high-throughput manner. ER -