RT Journal Article SR Electronic T1 Using natural language processing and machine learning to classify health literacy from secure messages: The ECLIPPSE study JF bioRxiv FD Cold Spring Harbor Laboratory SP 406876 DO 10.1101/406876 A1 Renu Balyan A1 Scott A. Crossley A1 William Brown III A1 Andrew J. Karter A1 Danielle S. McNamara A1 Jennifer Y. Liu A1 Courtney R. Lyles A1 Dean Schillinger YR 2018 UL http://biorxiv.org/content/early/2018/09/03/406876.abstract AB Limited health literacy can be a barrier to healthcare delivery, but widespread classification of patient health literacy is challenging. We applied natural language processing and machine learning on a large sample of 283,216 secure messages sent from 6,941 patients to their clinicians for this study to develop and validate literacy profiles as indicators of patients’ health literacy. All patients were participants in Kaiser Permanente Northern California’s DISTANCE Study. We created three literacy profiles, comparing performance of each literacy profile against a gold standard of patient self-report. We also analyzed associations between the literacy profiles and patient demographics, health outcomes and healthcare utilization. T-tests were used for numeric data such as A1C, Charlson comorbidity index and healthcare utilization rates, and chi-square tests for categorical data such as sex, race, continuous medication gaps and severe hypoglycemia. Literacy profiles varied in their test characteristics, with C-statistics ranging from 0.61-0.74. Relationships between literacy profiles and health outcomes revealed patterns consistent with previous health literacy research: patients identified via literacy profiles as having limited health literacy were older and more likely minority; had poorer medication adherence and glycemic control; and higher rates of hypoglycemia, comorbidities and healthcare utilization. This research represents the first successful attempt to use natural language processing and machine learning to measure health literacy. Literacy profiles offer an automated and economical way to identify patients with limited health literacy and a greater vulnerability to poor health outcomes.