Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

So you think you can PLS-DA?

View ORCID ProfileDaniel Ruiz-Perez, Haibin Guan, Purnima Madhivanan, Kalai Mathee, Giri Narasimhan
doi: https://doi.org/10.1101/207225
Daniel Ruiz-Perez
1Bioinformatics Research Group (BioRG), Florida International University, 11200 SW 8th St, 33199, Miami, FL, USA email: , ,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Daniel Ruiz-Perez
  • For correspondence: druiz072@cs.fiu.edu hguan003@cs.fiu.edu giri@cs.fiu.edu
Haibin Guan
1Bioinformatics Research Group (BioRG), Florida International University, 11200 SW 8th St, 33199, Miami, FL, USA email: , ,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: druiz072@cs.fiu.edu hguan003@cs.fiu.edu giri@cs.fiu.edu
Purnima Madhivanan
2Department of Epidemiology, Florida International University, 11200 SW 8th St, 24105, Miami, FL, USA. email:
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: pmadhiva@fiu.edu
Kalai Mathee
3Herbert Wertheim College of Medicine, Florida International University, 11200 SW 8th St, 24105, Miami, FL, USA. email:
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: matheek@fiu.edu
Giri Narasimhan
1Bioinformatics Research Group (BioRG), Florida International University, 11200 SW 8th St, 33199, Miami, FL, USA email: , ,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: giri@cs.fiu.edu druiz072@cs.fiu.edu hguan003@cs.fiu.edu giri@cs.fiu.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Background Partial Least-Squares Discriminant Analysis (PLS-DA) is a popular machine learning tool that is gaining increasing attention as a useful feature selector and classifier. In an effort to understand its strengths and weaknesses, we performed a series of experiments with synthetic data and compared its performance to its close relative from which it was initially invented, namely Principal Component Analysis (PCA).

Results We demonstrate that even though PCA ignores the information regarding the class labels of the samples, this unsupervised tool can be remarkably effective as a feature selector. In some cases, it outperforms PLS-DA, which is made aware of the class labels in its input. Our experiments range from looking at the signal-to-noise ratio in the feature selection task, to considering many practical distributions and models encountered when analyzing bioinformatics and clinical data. Other methods were also evaluated. Finally, we analyzed an interesting data set from 396 vaginal microbiome samples where the ground truth for the feature selection was available. All the 3D figures shown in this paper as well as the supplementary ones can be viewed interactively at http://biorg.cs.fiu.edu/plsda

Conclusions Our results highlighted the strengths and weaknesses of PLS-DA in comparison with PCA for different underlying data models.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • Added some experiments and edited some typos. Changes were based on reviewer comments.

  • List of abbreviations

    PLS-DA
    Partial Least-Squares Discriminant Analysis
    PCA
    Principal Component Analysis
    CV
    Cross-Validation
    PC
    Principal Components
    sPLS-DA
    Sparse Partial Least-Squares Discriminant Analysis
    tp
    true positives
    tn
    true negatives
    fp
    false positives
    fn
    false negatives
    SPCA
    Sparse Principal Component Analysis
    ICA
    Independent Component Analysis
    RLDA
    Regularized Linear Discriminant Analysis
    SVD
    Singular Value Decomposition
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
    Back to top
    PreviousNext
    Posted May 23, 2020.
    Download PDF
    Email

    Thank you for your interest in spreading the word about bioRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    So you think you can PLS-DA?
    (Your Name) has forwarded a page to you from bioRxiv
    (Your Name) thought you would like to see this page from the bioRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    So you think you can PLS-DA?
    Daniel Ruiz-Perez, Haibin Guan, Purnima Madhivanan, Kalai Mathee, Giri Narasimhan
    bioRxiv 207225; doi: https://doi.org/10.1101/207225
    Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
    Citation Tools
    So you think you can PLS-DA?
    Daniel Ruiz-Perez, Haibin Guan, Purnima Madhivanan, Kalai Mathee, Giri Narasimhan
    bioRxiv 207225; doi: https://doi.org/10.1101/207225

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Bioinformatics
    Subject Areas
    All Articles
    • Animal Behavior and Cognition (2416)
    • Biochemistry (4774)
    • Bioengineering (3319)
    • Bioinformatics (14626)
    • Biophysics (6617)
    • Cancer Biology (5156)
    • Cell Biology (7402)
    • Clinical Trials (138)
    • Developmental Biology (4340)
    • Ecology (6858)
    • Epidemiology (2057)
    • Evolutionary Biology (9876)
    • Genetics (7328)
    • Genomics (9496)
    • Immunology (4534)
    • Microbiology (12631)
    • Molecular Biology (4919)
    • Neuroscience (28206)
    • Paleontology (198)
    • Pathology (802)
    • Pharmacology and Toxicology (1380)
    • Physiology (2012)
    • Plant Biology (4473)
    • Scientific Communication and Education (974)
    • Synthetic Biology (1295)
    • Systems Biology (3903)
    • Zoology (722)