Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Unsupervised multiple kernel learning for heterogeneous data integration

View ORCID ProfileJérôme Mariette, View ORCID ProfileNathalie Villa-Vialaneix
doi: https://doi.org/10.1101/139287
Jérôme Mariette
1MIAT, Université de Toulouse, INRA, 31326 Castanet-Tolosan, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jérôme Mariette
Nathalie Villa-Vialaneix
1MIAT, Université de Toulouse, INRA, 31326 Castanet-Tolosan, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nathalie Villa-Vialaneix
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Recent high-throughput sequencing advances have expanded the breadth of available omics datasets and the integrated analysis of multiple datasets obtained on the same samples has allowed to gain important insights in a wide range of applications. However, the integration of various sources of information remains a challenge for systems biology since produced datasets are often of heterogeneous types, with the need of developing generic methods to take their different specificities into account.

We propose a multiple kernel framework that allows to integrate multiple datasets of various types into a single exploratory analysis. Several solutions are provided to learn either a consensus meta-kernel or a meta-kernel that preserves the original topology of the datasets. We applied our framework to analyse two public multi-omics datasets. First, the multiple metagenomic datasets, collected during the TARA Oceans expedition, was explored to demonstrate that our method is able to retrieve previous findings in a single KPCA as well as to provide a new image of the sample structures when a larger number of datasets are included in the analysis. To perform this analysis, a generic procedure is also proposed to improve the interpretability of the kernel PCA in regards with the original data. Second, the multi-omics breast cancer datasets, provided by The Cancer Genome Atlas, is analysed using a kernel Self-Organizing Maps with both single and multi-omics strategies. The comparison of this two approaches demonstrates the benefit of our integration method to improve the representation of the studied biological system.

Proposed methods are available in the R package mixKernel, released on CRAN. It is fully compatible with the mixOmics package and a tutorial describing the approach can be found on mixOmics web site http://mixomics.org/mixkernel/.

Footnotes

  • ↵* jerome.mariette{at}inra.fr

  • ↵** nathalie.villa-vialaneix{at}inra.fr

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted October 10, 2017.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Unsupervised multiple kernel learning for heterogeneous data integration
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Unsupervised multiple kernel learning for heterogeneous data integration
Jérôme Mariette, Nathalie Villa-Vialaneix
bioRxiv 139287; doi: https://doi.org/10.1101/139287
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Unsupervised multiple kernel learning for heterogeneous data integration
Jérôme Mariette, Nathalie Villa-Vialaneix
bioRxiv 139287; doi: https://doi.org/10.1101/139287

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One
Subject Areas
All Articles
  • Animal Behavior and Cognition (2646)
  • Biochemistry (5264)
  • Bioengineering (3678)
  • Bioinformatics (15796)
  • Biophysics (7253)
  • Cancer Biology (5627)
  • Cell Biology (8095)
  • Clinical Trials (138)
  • Developmental Biology (4765)
  • Ecology (7516)
  • Epidemiology (2059)
  • Evolutionary Biology (10576)
  • Genetics (7729)
  • Genomics (10130)
  • Immunology (5192)
  • Microbiology (13904)
  • Molecular Biology (5384)
  • Neuroscience (30778)
  • Paleontology (215)
  • Pathology (878)
  • Pharmacology and Toxicology (1524)
  • Physiology (2254)
  • Plant Biology (5022)
  • Scientific Communication and Education (1041)
  • Synthetic Biology (1385)
  • Systems Biology (4146)
  • Zoology (812)