Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Prediction of rapid kidney function decline using machine learning combining blood biomarkers and electronic health record data

Girish N. Nadkarni, Fergus Fleming, James R. McCullough, Kinsuk Chauhan, Divya A. Verghese, John C. He, John Quackenbush, Joseph V. Bonventre, Barbara Murphy, Chirag R. Parikh, Michael Donovan, Steven G. Coca
doi: https://doi.org/10.1101/587774
Girish N. Nadkarni
1Department of Internal Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Fergus Fleming
2RenalytixAI, New York, NY
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
James R. McCullough
2RenalytixAI, New York, NY
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kinsuk Chauhan
1Department of Internal Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Divya A. Verghese
1Department of Internal Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John C. He
1Department of Internal Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John Quackenbush
3Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joseph V. Bonventre
4Renal Division, Brigham and Women’s Hospital, Boston, Massachusetts
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Barbara Murphy
1Department of Internal Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Chirag R. Parikh
5Department of Internal Medicine, Johns Hopkins School of Medicine, Baltimore, MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael Donovan
6Department of Pathology, Icahn School of Medicine at Mount Sinai, New York, NY
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Steven G. Coca
1Department of Internal Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

ABSTRACT

Introduction Individuals with type 2 diabetes (T2DM) or the APOL1 high-risk genotype (APOL1) are at increased risk of rapid kidney function decline (RKFD) as compared to the general population. Plasma biomarkers representing inflammatory and kidney injury pathways have been validated as predictive of kidney disease progression in several studies. In addition, routine clinical data in the electronic health record (EHR) may also be utilized for predictive purposes. The application of machine learning to integrate biomarkers with clinical data may lead to improved identification of RKFD.

Methods We selected two subpopulations of high-risk individuals: T2DM (n=871) and APOL1 high risk genotype of African Ancestry (n=498), with a baseline eGFR ≥ 45 ml/min/1.73 m2 from the Mount Sinai BioMe Biobank. Plasma levels of tumor necrosis factor 1/2 (TNFR1/2), and kidney injury molecule-1 (KIM-1) were measured and a series of supervised machine learning approaches including random forest (RF) were employed to combine the biomarker data with longitudinal clinical variables. The primary objective was to accurately predict RKFD (eGFR decline of ≥ 5 ml/min/1.73 m2/year) based on an algorithm-produced score and probability cutoffs, with results compared to standard of care.

Results In 871 participants with T2DM, the mean age was 61 years, baseline estimated glomerular filtration rate (eGFR) was 74 ml/min/1.73 m2, and median UACR was 13 mg/g. The median follow-up was 4.7 years from the baseline specimen collection with additional retrospective data available for a median of 2.3 years prior to plasma collection. In the 498 African Ancestry patients with high-risk APOL1 genotype, the median age was 56 years, median baseline eGFR was 83 ml/min/1.73 m2,and median UACR was 11 mg/g. The median follow-up was 4.7 years and there was additional retrospective data available for 3.1 years prior to plasma collection. Overall, 19% with T2DM, and 9% of the APOL1 high-risk genotype experienced RKFD. After evaluation of three supervised algorithms: random forest (RF), support vector machine (SVM), and Cox survival, the RF model was selected. In the training and test sets respectively, the RF model had an AUC of 0.82 (95% CI, 0.81-0.83) and 0.80 (95% CI, 0.78-0.82) in T2DM, and an AUC of 0.85 (95% CI, 0.84-0.87) and 0.80 (95% CI, 0.73-0.86) for the APOL1 high-risk group. The combined RF model outperformed standard clinical variables in both patient populations. Discrimination was comparable in two sensitivity analyses: 1) Using only data from ≤ 1 year prior to baseline biomarker measurement and 2) In individuals with eGFR ≤60 and/or albuminuria at baseline. The distribution of RFKD probability varied in the two populations. In patients with T2DM, the RKFD score stratified 18%, 49%, and 33% of patients to high-, intermediate-, and low-probability strata, respectively, with a PPV of 53% in the high-probability group and an NPV of 97% in the low-probability group. By comparison, in the APOL1 high-risk genotype, the RKFD score stratified 7%, 23%, and 70% of patients to high-, intermediate-, and low-probability strata, respectively, with a 46% PPV in the high-probability and an NPV of 98% NPV in the low-probability group.

Conclusions In patients with T2DM or of African Ancestry with the high-risk APOL1 genotype, a RF model derived from plasma biomarkers and longitudinal EHR data significantly improved prediction of rapid kidney function decline over standard clinical models. With further validation, this approach may be valuable in aiding clinicians in identifying patients who would benefit most from early and more aggressive follow-up to mitigate kidney disease progression.

Footnotes

  • ↵* co-senior authors

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted March 28, 2019.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Prediction of rapid kidney function decline using machine learning combining blood biomarkers and electronic health record data
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Prediction of rapid kidney function decline using machine learning combining blood biomarkers and electronic health record data
Girish N. Nadkarni, Fergus Fleming, James R. McCullough, Kinsuk Chauhan, Divya A. Verghese, John C. He, John Quackenbush, Joseph V. Bonventre, Barbara Murphy, Chirag R. Parikh, Michael Donovan, Steven G. Coca
bioRxiv 587774; doi: https://doi.org/10.1101/587774
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Prediction of rapid kidney function decline using machine learning combining blood biomarkers and electronic health record data
Girish N. Nadkarni, Fergus Fleming, James R. McCullough, Kinsuk Chauhan, Divya A. Verghese, John C. He, John Quackenbush, Joseph V. Bonventre, Barbara Murphy, Chirag R. Parikh, Michael Donovan, Steven G. Coca
bioRxiv 587774; doi: https://doi.org/10.1101/587774

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Epidemiology
Subject Areas
All Articles
  • Animal Behavior and Cognition (4091)
  • Biochemistry (8783)
  • Bioengineering (6490)
  • Bioinformatics (23375)
  • Biophysics (11761)
  • Cancer Biology (9163)
  • Cell Biology (13266)
  • Clinical Trials (138)
  • Developmental Biology (7419)
  • Ecology (11378)
  • Epidemiology (2066)
  • Evolutionary Biology (15100)
  • Genetics (10406)
  • Genomics (14017)
  • Immunology (9133)
  • Microbiology (22084)
  • Molecular Biology (8792)
  • Neuroscience (47417)
  • Paleontology (350)
  • Pathology (1421)
  • Pharmacology and Toxicology (2483)
  • Physiology (3709)
  • Plant Biology (8056)
  • Scientific Communication and Education (1433)
  • Synthetic Biology (2213)
  • Systems Biology (6019)
  • Zoology (1251)