Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

STATegra: Multi-omics data integration - A conceptual scheme and a bioinformatics pipeline

View ORCID ProfileNuria Planell, Vincenzo Lagani, Patricia Sebastian-Leon, Frans van der Kloet, Ewoud Ewing, Nestoras Karathanasis, Arantxa Urdangarin, Imanol Arozarena, Maja Jagodic, Ioannis Tsamardinos, Sonia Tarazona, Ana Conesa, Jesper Tegner, View ORCID ProfileDavid Gomez-Cabrero
doi: https://doi.org/10.1101/2020.11.20.391045
Nuria Planell
1Translational Bioinformatics Unit, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Pamplona, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nuria Planell
Vincenzo Lagani
2Institute of Chemical Biology, Ilia State University, Tbilisi, Georgia
3Gnosis Data Analysis P.C., Heraklion, Greece
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Patricia Sebastian-Leon
4Department of Genomic and Systems Reproductive Medicine, IVI-RMA (Instituto Valenciano de Infertilidad – Reproductive Medicine Associates) IVI Foundation. Valencia, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Frans van der Kloet
5Swammerdam Institute for Life Sciences, University of Amsterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ewoud Ewing
6Department of Clinical Neuroscience, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, 171 76 Stockholm, Sweden
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nestoras Karathanasis
7Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, Greece (past)
8Computational Medicine Center, Thomas Jefferson University, Philadelphia, PA, USA (current)
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Arantxa Urdangarin
1Translational Bioinformatics Unit, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Pamplona, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Imanol Arozarena
9Cancer Signalling Unit, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), Health Research Institute of Navarre (IdiSNA), Pamplona, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Maja Jagodic
6Department of Clinical Neuroscience, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, 171 76 Stockholm, Sweden
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ioannis Tsamardinos
3Gnosis Data Analysis P.C., Heraklion, Greece
10Computer Science Department, University of Crete
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sonia Tarazona
11Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, València, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ana Conesa
12Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL-32611, United States
13Genetics Institute, University of Florida, Gainesville, FL-32608, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jesper Tegner
14Biological and Environmental Sciences and Engineering Division, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
15Unit of Computational Medicine, Department of Medicine, Center for Molecular Medicine, Karolinska Institutet, Karolinska University Hospital, L8:05, SE-171 76, Stockholm, Sweden
16Science for Life Laboratory, Tomtebodavagen 23A, SE-17165, Solna, Sweden
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David Gomez-Cabrero
1Translational Bioinformatics Unit, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Pamplona, Spain
14Biological and Environmental Sciences and Engineering Division, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
15Unit of Computational Medicine, Department of Medicine, Center for Molecular Medicine, Karolinska Institutet, Karolinska University Hospital, L8:05, SE-171 76, Stockholm, Sweden
17Mucosal & Salivary Biology Division, King’s College London Dental Institute, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for David Gomez-Cabrero
  • For correspondence: lunacab@gmail.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Technologies for profiling samples using different omics platforms have been at the forefront since the human genome project. Large-scale multi-omics data hold the promise of deciphering different regulatory layers. Yet, while there is a myriad of bioinformatics tools, each multi-omics analysis appears to start from scratch with an arbitrary decision over which tools to use and how to combine them. It is therefore an unmet need to conceptualize how to integrate such data and to implement and validate pipelines in different cases. We have designed a conceptual framework (STATegra), aiming it to be as generic as possible for multi-omics analysis, combining machine learning component analysis, non-parametric data combination and a multi-omics exploratory analysis in a step-wise manner. While in several studies we have previously combined those integrative tools, here we provide a systematic description of the STATegra framework and its validation using two TCGA case studies. For both, the Glioblastoma and the Skin Cutaneous Melanoma cases, we demonstrate an enhanced capacity to identify features in comparison to single-omics analysis. Such an integrative multi-omics analysis framework for the identification of features and components facilitates the discovery of new biology. Finally, we provide several options for applying the STATegra framework when parametric assumptions are fulfilled, and for the case when not all the samples are profiled for all omics. The STATegra framework is built using several tools, which are being integrated step-by-step as OpenSource in the STATegRa Bioconductor package https://bioconductor.org/packages/release/bioc/html/STATegra.html.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • https://portal.gdc.cancer.gov

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted November 20, 2020.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
STATegra: Multi-omics data integration - A conceptual scheme and a bioinformatics pipeline
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
STATegra: Multi-omics data integration - A conceptual scheme and a bioinformatics pipeline
Nuria Planell, Vincenzo Lagani, Patricia Sebastian-Leon, Frans van der Kloet, Ewoud Ewing, Nestoras Karathanasis, Arantxa Urdangarin, Imanol Arozarena, Maja Jagodic, Ioannis Tsamardinos, Sonia Tarazona, Ana Conesa, Jesper Tegner, David Gomez-Cabrero
bioRxiv 2020.11.20.391045; doi: https://doi.org/10.1101/2020.11.20.391045
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
STATegra: Multi-omics data integration - A conceptual scheme and a bioinformatics pipeline
Nuria Planell, Vincenzo Lagani, Patricia Sebastian-Leon, Frans van der Kloet, Ewoud Ewing, Nestoras Karathanasis, Arantxa Urdangarin, Imanol Arozarena, Maja Jagodic, Ioannis Tsamardinos, Sonia Tarazona, Ana Conesa, Jesper Tegner, David Gomez-Cabrero
bioRxiv 2020.11.20.391045; doi: https://doi.org/10.1101/2020.11.20.391045

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4655)
  • Biochemistry (10307)
  • Bioengineering (7618)
  • Bioinformatics (26200)
  • Biophysics (13453)
  • Cancer Biology (10625)
  • Cell Biology (15348)
  • Clinical Trials (138)
  • Developmental Biology (8455)
  • Ecology (12761)
  • Epidemiology (2067)
  • Evolutionary Biology (16777)
  • Genetics (11361)
  • Genomics (15405)
  • Immunology (10554)
  • Microbiology (25060)
  • Molecular Biology (10162)
  • Neuroscience (54128)
  • Paleontology (398)
  • Pathology (1655)
  • Pharmacology and Toxicology (2877)
  • Physiology (4314)
  • Plant Biology (9204)
  • Scientific Communication and Education (1582)
  • Synthetic Biology (2543)
  • Systems Biology (6753)
  • Zoology (1453)