RT Journal Article
SR Electronic
T1 Development of a Prediction Model for Incident Atrial Fibrillation using Machine Learning Applied to Harmonized Electronic Health Record Data
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 520866
DO 10.1101/520866
A1 Premanand Tiwari
A1 Katie Colborn
A1 Derek E. Smith
A1 Fuyong Xing
A1 Debashis Ghosh
A1 Michael A. Rosenberg
YR 2019
UL http://biorxiv.org/content/early/2019/01/18/520866.abstract
AB Atrial fibrillation (AF) is the most common sustained cardiac arrhythmia, whose early detection could lead to significant improvements in outcomes through appropriate prescription of anticoagulation. Although a variety of methods exist for screening for AF, there is general agreement that a targeted approach would be preferred. Implicit within this approach is the need for an efficient method for identification of patients at risk. In this investigation, we examined the strengths and weaknesses of an approach based on application of machine-learning algorithms to electronic health record (EHR) data that has been harmonized to the Observational Medical Outcomes Partnership (OMOP) common data model. We examined data from a total of 2.3M individuals, of whom 1.16% developed incident AF over designated 6-month time intervals. We examined and compared several approaches for data reduction, sample balancing (re-sampling) and predictive modeling using cross-validation for hyperparameter selection, and out-of-sample testing for validation. Although no approach provided outstanding classification accuracy, we found that the optimal approach for prediction of 6-month incident AF used a random forest classifier, raw features (no data reduction), and synthetic minority oversampling technique (SMOTE) resampling (F1 statistic 0.12, AUC 0.65). This model performed better than a predictive model based only on known AF risk factors, and highlighted the importance of using resampling methods to optimize ML approaches to imbalanced data as exists in EHRs. Further studies using EHR data in other medical systems are needed to validate the clinical applicability of these findings.