RT Journal Article SR Electronic T1 Unsupervised Machine learning to subtype Sepsis-Associated Acute Kidney Injury JF bioRxiv FD Cold Spring Harbor Laboratory SP 447425 DO 10.1101/447425 A1 Kumardeep Chaudhary A1 Aine Duffy A1 Priti Poojary A1 Aparna Saha A1 Kinsuk Chauhan A1 Ron Do A1 Tielman Van Vleck A1 Steven G. Coca A1 Lili Chan A1 Girish N. Nadkarni YR 2018 UL http://biorxiv.org/content/early/2018/10/18/447425.abstract AB Objective Acute kidney injury (AKI) is highly prevalent in critically ill patients with sepsis. Sepsis-associated AKI is a heterogeneous clinical entity, and, like many complex syndromes, is composed of distinct subtypes. We aimed to agnostically identify AKI subphenotypes using machine learning techniques and routinely collected data in electronic health records (EHRs).Design Cohort study utilizing the MIMIC-III Database.Setting ICUs from tertiary care hospital in the U.S.Patients Patients older than 18 years with sepsis and who developed AKI within 48 hours of ICU admission.Interventions Unsupervised machine learning utilizing all available vital signs and laboratory measurements.Measurements and Main Results We identified 1,865 patients with sepsis-associated AKI. Ten vital signs and 691 unique laboratory results were identified. After data processing and feature selection, 59 features, of which 28 were measures of intra-patient variability, remained for inclusion into an unsupervised machine-learning algorithm. We utilized k-means clustering with k ranging from 2 – 10; k=2 had the highest silhouette score (0.62). Cluster 1 had 1,358 patients while Cluster 2 had 507 patients. There were no significant differences between clusters on age, race or gender. We found significant differences in comorbidities and small but significant differences in several laboratory variables (hematocrit, bicarbonate, albumin) and vital signs (systolic blood pressure and heart rate). In-hospital mortality was higher in cluster 2 patients, 25% vs. 20%, p=0.008. Features with the largest differences between clusters included variability in basophil and eosinophil counts, alanine aminotransferase levels and creatine kinase values.Conclusions Utilizing routinely collected laboratory variables and vital signs in the EHR, we were able to identify two distinct subphenotypes of sepsis-associated AKI with different outcomes. Variability in laboratory variables, as opposed to their actual value, was more important for determination of subphenotypes. Our findings show the potential utility of unsupervised machine learning to better subtype AKI.