Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Tree-weighting for multi-study ensemble learners

Maya Ramchandran, Prasad Patil, Giovanni Parmigiani
doi: https://doi.org/10.1101/698779
Maya Ramchandran
1Department of Biostatistics, Harvard T.H. Chan School of Public Health
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: maya_ramchandran@g.harvard.edu
Prasad Patil
1Department of Biostatistics, Harvard T.H. Chan School of Public Health
2Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Giovanni Parmigiani
1Department of Biostatistics, Harvard T.H. Chan School of Public Health
2Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Multi-study learning uses multiple training studies, separately trains classifiers on individual studies, and then forms ensembles with weights rewarding members with better cross-study prediction ability. This article considers novel weighting approaches for constructing tree-based ensemble learners in this setting. Using Random Forests as a single-study learner, we perform a comparison of either weighting each forest to form the ensemble, or extracting the individual trees trained by each Random Forest and weighting them directly. We consider weighting approaches that reward cross-study replicability within the training set. We find that incorporating multiple layers of ensembling in the training process increases the robustness of the resulting predictor. Furthermore, we explore the mechanisms by which the ensembling weights correspond to the internal structure of trees to shed light on the important features in determining the relationship between the Random Forests algorithm and the true outcome model. Finally, we apply our approach to genomic datasets and show that our method improves upon the basic multi-study learning paradigm.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted July 11, 2019.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Tree-weighting for multi-study ensemble learners
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Tree-weighting for multi-study ensemble learners
Maya Ramchandran, Prasad Patil, Giovanni Parmigiani
bioRxiv 698779; doi: https://doi.org/10.1101/698779
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Tree-weighting for multi-study ensemble learners
Maya Ramchandran, Prasad Patil, Giovanni Parmigiani
bioRxiv 698779; doi: https://doi.org/10.1101/698779

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4222)
  • Biochemistry (9096)
  • Bioengineering (6735)
  • Bioinformatics (23916)
  • Biophysics (12066)
  • Cancer Biology (9484)
  • Cell Biology (13722)
  • Clinical Trials (138)
  • Developmental Biology (7614)
  • Ecology (11645)
  • Epidemiology (2066)
  • Evolutionary Biology (15461)
  • Genetics (10611)
  • Genomics (14281)
  • Immunology (9448)
  • Microbiology (22753)
  • Molecular Biology (9057)
  • Neuroscience (48813)
  • Paleontology (354)
  • Pathology (1478)
  • Pharmacology and Toxicology (2559)
  • Physiology (3818)
  • Plant Biology (8300)
  • Scientific Communication and Education (1467)
  • Synthetic Biology (2285)
  • Systems Biology (6163)
  • Zoology (1296)