Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Development of a Prediction Model for Incident Atrial Fibrillation using Machine Learning Applied to Harmonized Electronic Health Record Data

Premanand Tiwari, Katie Colborn, Derek E. Smith, Fuyong Xing, Debashis Ghosh, Michael A. Rosenberg
doi: https://doi.org/10.1101/520866
Premanand Tiwari
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Katie Colborn
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Derek E. Smith
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Fuyong Xing
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Debashis Ghosh
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael A. Rosenberg
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Atrial fibrillation (AF) is the most common sustained cardiac arrhythmia, whose early detection could lead to significant improvements in outcomes through appropriate prescription of anticoagulation. Although a variety of methods exist for screening for AF, there is general agreement that a targeted approach would be preferred. Implicit within this approach is the need for an efficient method for identification of patients at risk. In this investigation, we examined the strengths and weaknesses of an approach based on application of machine-learning algorithms to electronic health record (EHR) data that has been harmonized to the Observational Medical Outcomes Partnership (OMOP) common data model. We examined data from a total of 2.3M individuals, of whom 1.16% developed incident AF over designated 6-month time intervals. We examined and compared several approaches for data reduction, sample balancing (re-sampling) and predictive modeling using cross-validation for hyperparameter selection, and out-of-sample testing for validation. Although no approach provided outstanding classification accuracy, we found that the optimal approach for prediction of 6-month incident AF used a random forest classifier, raw features (no data reduction), and synthetic minority oversampling technique (SMOTE) resampling (F1 statistic 0.12, AUC 0.65). This model performed better than a predictive model based only on known AF risk factors, and highlighted the importance of using resampling methods to optimize ML approaches to imbalanced data as exists in EHRs. Further studies using EHR data in other medical systems are needed to validate the clinical applicability of these findings.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted January 18, 2019.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Development of a Prediction Model for Incident Atrial Fibrillation using Machine Learning Applied to Harmonized Electronic Health Record Data
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Development of a Prediction Model for Incident Atrial Fibrillation using Machine Learning Applied to Harmonized Electronic Health Record Data
Premanand Tiwari, Katie Colborn, Derek E. Smith, Fuyong Xing, Debashis Ghosh, Michael A. Rosenberg
bioRxiv 520866; doi: https://doi.org/10.1101/520866
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Development of a Prediction Model for Incident Atrial Fibrillation using Machine Learning Applied to Harmonized Electronic Health Record Data
Premanand Tiwari, Katie Colborn, Derek E. Smith, Fuyong Xing, Debashis Ghosh, Michael A. Rosenberg
bioRxiv 520866; doi: https://doi.org/10.1101/520866

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Epidemiology
Subject Areas
All Articles
  • Animal Behavior and Cognition (2430)
  • Biochemistry (4791)
  • Bioengineering (3333)
  • Bioinformatics (14683)
  • Biophysics (6640)
  • Cancer Biology (5171)
  • Cell Biology (7428)
  • Clinical Trials (138)
  • Developmental Biology (4366)
  • Ecology (6873)
  • Epidemiology (2057)
  • Evolutionary Biology (9925)
  • Genetics (7346)
  • Genomics (9532)
  • Immunology (4557)
  • Microbiology (12686)
  • Molecular Biology (4948)
  • Neuroscience (28344)
  • Paleontology (199)
  • Pathology (809)
  • Pharmacology and Toxicology (1392)
  • Physiology (2024)
  • Plant Biology (4499)
  • Scientific Communication and Education (977)
  • Synthetic Biology (1299)
  • Systems Biology (3916)
  • Zoology (726)