Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Multi-Modality Machine Learning Predicting Parkinson’s Disease

View ORCID ProfileMary B. Makarious, Hampton L. Leonard, Dan Vitale, Hirotaka Iwaki, Lana Sargent, Anant Dadu, Ivo Violich, Elizabeth Hutchins, David Saffo, Sara Bandres-Ciga, Jonggeol Jeff Kim, Yeajin Song, Matt Bookman, Willy Nojopranoto, Roy H. Campbell, Sayed Hadi Hashemi, View ORCID ProfileJuan A. Botia, John F. Carter, Melina Maleknia, David W. Craig, Kendall Van Keuren-Jensen, Huw R. Morris, John A. Hardy, Cornelis Blauwendraat, Andrew B. Singleton, Faraz Faghri, Mike A. Nalls on behalf of the Accelerating Medicines Program - Parkinson’s Disease (AMP PD) and the Global Parkinson’s Genetics Program (GP2).
doi: https://doi.org/10.1101/2021.03.05.434104
Mary B. Makarious
1Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
2Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
3UCL Movement Disorders Centre, University College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mary B. Makarious
Hampton L. Leonard
1Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
4Center for Alzheimer’s and Related Dementias, National Institutes of Health, Bethesda, MD, USA
5Data Tecnica International LLC, Glen Echo, MD, USA
6German Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Dan Vitale
5Data Tecnica International LLC, Glen Echo, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hirotaka Iwaki
1Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
4Center for Alzheimer’s and Related Dementias, National Institutes of Health, Bethesda, MD, USA
5Data Tecnica International LLC, Glen Echo, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lana Sargent
1Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
4Center for Alzheimer’s and Related Dementias, National Institutes of Health, Bethesda, MD, USA
7School of Nursing, Virginia Commonwealth University, Richmond, VA, USA
8Geriatric Pharmacotherapy Program, School of Pharmacy, Virginia Commonwealth University, VA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Anant Dadu
9Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ivo Violich
10Institute of Translational Genomics, University of Southern California, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Elizabeth Hutchins
11Neurogenomics Division, Translational Genomics Research Institute (TGen), Phoenix, AZ, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David Saffo
12Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sara Bandres-Ciga
1Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jonggeol Jeff Kim
1Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
13Preventive Neurology Unit, Wolfson Institute of Preventive Medicine, Queen Mary University of London
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yeajin Song
5Data Tecnica International LLC, Glen Echo, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Matt Bookman
14Verily Life Sciences, South San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Willy Nojopranoto
14Verily Life Sciences, South San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Roy H. Campbell
9Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sayed Hadi Hashemi
9Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Juan A. Botia
15Department of Molecular Neuroscience, UCL Queen Square Institute of Neurology, London, UK
16Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Juan A. Botia
John F. Carter
17ModelOp, Chicago, IL, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Melina Maleknia
18Georgia Institute of Technology, Atlanta, GA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David W. Craig
10Institute of Translational Genomics, University of Southern California, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kendall Van Keuren-Jensen
11Neurogenomics Division, Translational Genomics Research Institute (TGen), Phoenix, AZ, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Huw R. Morris
2Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
3UCL Movement Disorders Centre, University College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John A. Hardy
2Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
3UCL Movement Disorders Centre, University College London, London, UK
19UK Dementia Research Institute and Department of Neurodegenerative Disease and Reta Lila Weston Institute, London, UK
20Institute for Advanced Study, The Hong Kong University of Science and Technology, Hong Kong SAR, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cornelis Blauwendraat
1Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrew B. Singleton
1Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
4Center for Alzheimer’s and Related Dementias, National Institutes of Health, Bethesda, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Faraz Faghri
1Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
4Center for Alzheimer’s and Related Dementias, National Institutes of Health, Bethesda, MD, USA
5Data Tecnica International LLC, Glen Echo, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mike A. Nalls
1Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
4Center for Alzheimer’s and Related Dementias, National Institutes of Health, Bethesda, MD, USA
5Data Tecnica International LLC, Glen Echo, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: nallsm@mail.nih.gov
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

SUMMARY

Background Personalized medicine promises individualized disease prediction and treatment. The convergence of machine learning (ML) and available multi-modal data is key moving forward. We build upon previous work to deliver multi-modal predictions of Parkinson’s Disease (PD).

Methods We performed automated ML on multi-modal data from the Parkinson’s Progression Marker Initiative (PPMI). After selecting the best performing algorithm, all PPMI data was used to tune the selected model. The model was validated in the Parkinson’s Disease Biomarker Program (PDBP) dataset. Finally, networks were built to identify gene communities specific to PD.

Findings Our initial model showed an area under the curve (AUC) of 89.72% for the diagnosis of PD. The tuned model was then tested for validation on external data (PDBP, AUC 85.03%). Optimizing thresholds for classification, increased the diagnosis prediction accuracy (balanced accuracy) and other metrics. Combining data modalities outperforms the single biomarker paradigm. UPSIT was the largest contributing predictor for the classification of PD. The transcriptomic data was used to construct a network of disease-relevant transcripts.

Interpretation We have built a model using an automated ML pipeline to make improved multi-omic predictions of PD. The model developed improves disease risk prediction, a critical step for better assessment of PD risk. We constructed gene expression networks for the next generation of genomics-derived interventions. Our automated ML approach allows complex predictive models to be reproducible and accessible to the community.

Funding National Institute on Aging, National Institute of Neurological Disorders and Stroke, the Michael J. Fox Foundation, and the Global Parkinson’s Genetics Program.

Evidence before this study Prior research into predictors of Parkinson’s disease (PD) has either used basic statistical methods to make predictions across data modalities, or they have focused on a single data type or biomarker model. We have done this using an open-source automated machine learning (ML) framework on extensive multi-modal data, which we believe yields robust and reproducible results. We consider this the first true multi-modality ML study of PD risk classification.

Added value of this study We used a variety of linear, non-linear, kernel, neural networks, and ensemble ML algorithms to generate an accurate classification of both cases and controls in independent datasets using data that is not involved in PD diagnosis itself at study recruitment. The model built in this paper significantly improves upon our previous models that used the entire training dataset in previous work1. Building on this earlier work, we showed that the PD diagnosis can be refined using improved algorithmic classification tools that may yield potential biological insights. We have taken careful consideration to develop and validate this model using public controlled-access datasets and an open-source ML framework to allow for reproducible and transparent results.

Implications of all available evidence Training, validating, and tuning a diagnostic algorithm for PD will allow us to augment clinical diagnoses or risk assessments with less need for complex and expensive exams. Going forward, these models can be built on remote or asynchronously collected data which may be important in a growing telemedicine paradigm. More refined diagnostics will also increase clinical trial efficiency by potentially refining phenotyping and predicting onset, allowing providers to identify potential cases earlier. Early detection could lead to improved treatment response and higher efficacy. Finally, as part of our workflow, we built new networks representing communities of genes correlated in PD cases in a hypothesis-free manner, showing how new and existing genes may be connected and highlighting therapeutic opportunities.

Competing Interest Statement

HL, HI, FF, DV, YS, and MAN declare that they are consultants employed by Data Tecnica International, whose participation in this is part of a consulting agreement between the US National Institutes of Health and said company. HRM is employed by UCL. In the last 24 months he reports paid consultancy from Biogen, Biohaven, Lundbeck; lecture fees/honoraria from Wellcome Trust, Movement Disorders Society. Research Grants from Parkinsons UK, Cure Parkinsons Trust, PSP Association, CBD Solutions, Drake Foundation, Medical Research Council, Michael J Fox Foundation. HRM is also a co-applicant on a patent application related to C9ORF72 - Method for diagnosing a neurodegenerative disease (PCT/GB2012/052140).

Footnotes

  • Integrating clinico-demographic, genetic, and transcriptomic data within an automated machine learning open science framework to predict Parkinson’s disease and identify potential novel therapeutic targets for drug development.

  • https://github.com/GenoML/GenoML_multimodal_PD

  • https://amp-pd.org/

  • https://genoml.com/

  • https://share.streamlit.io/anant-dadu/shapleypdpredictiongenetics/main

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.
Back to top
PreviousNext
Posted March 07, 2021.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Multi-Modality Machine Learning Predicting Parkinson’s Disease
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Multi-Modality Machine Learning Predicting Parkinson’s Disease
Mary B. Makarious, Hampton L. Leonard, Dan Vitale, Hirotaka Iwaki, Lana Sargent, Anant Dadu, Ivo Violich, Elizabeth Hutchins, David Saffo, Sara Bandres-Ciga, Jonggeol Jeff Kim, Yeajin Song, Matt Bookman, Willy Nojopranoto, Roy H. Campbell, Sayed Hadi Hashemi, Juan A. Botia, John F. Carter, Melina Maleknia, David W. Craig, Kendall Van Keuren-Jensen, Huw R. Morris, John A. Hardy, Cornelis Blauwendraat, Andrew B. Singleton, Faraz Faghri, Mike A. Nalls
bioRxiv 2021.03.05.434104; doi: https://doi.org/10.1101/2021.03.05.434104
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Multi-Modality Machine Learning Predicting Parkinson’s Disease
Mary B. Makarious, Hampton L. Leonard, Dan Vitale, Hirotaka Iwaki, Lana Sargent, Anant Dadu, Ivo Violich, Elizabeth Hutchins, David Saffo, Sara Bandres-Ciga, Jonggeol Jeff Kim, Yeajin Song, Matt Bookman, Willy Nojopranoto, Roy H. Campbell, Sayed Hadi Hashemi, Juan A. Botia, John F. Carter, Melina Maleknia, David W. Craig, Kendall Van Keuren-Jensen, Huw R. Morris, John A. Hardy, Cornelis Blauwendraat, Andrew B. Singleton, Faraz Faghri, Mike A. Nalls
bioRxiv 2021.03.05.434104; doi: https://doi.org/10.1101/2021.03.05.434104

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4683)
  • Biochemistry (10361)
  • Bioengineering (7675)
  • Bioinformatics (26337)
  • Biophysics (13528)
  • Cancer Biology (10686)
  • Cell Biology (15440)
  • Clinical Trials (138)
  • Developmental Biology (8497)
  • Ecology (12821)
  • Epidemiology (2067)
  • Evolutionary Biology (16860)
  • Genetics (11399)
  • Genomics (15478)
  • Immunology (10617)
  • Microbiology (25218)
  • Molecular Biology (10223)
  • Neuroscience (54472)
  • Paleontology (401)
  • Pathology (1668)
  • Pharmacology and Toxicology (2897)
  • Physiology (4342)
  • Plant Biology (9247)
  • Scientific Communication and Education (1586)
  • Synthetic Biology (2558)
  • Systems Biology (6781)
  • Zoology (1466)