| Literature DB >> 33297937 |
Daniel Ruiz-Perez1, Haibin Guan1, Purnima Madhivanan2, Kalai Mathee3, Giri Narasimhan4.
Abstract
BACKGROUND: Partial Least-Squares Discriminant Analysis (PLS-DA) is a popular machine learning tool that is gaining increasing attention as a useful feature selector and classifier. In an effort to understand its strengths and weaknesses, we performed a series of experiments with synthetic data and compared its performance to its close relative from which it was initially invented, namely Principal Component Analysis (PCA).Entities:
Keywords: Bioinformatics; Dimensionality reduction; Feature selection; PCA; PLS-DA
Year: 2020 PMID: 33297937 PMCID: PMC7724830 DOI: 10.1186/s12859-019-3310-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Comparing the first principal component computed by PCA (pink) versus that computed by PLS-DA (orange) shows a data set where PLS-DA picks the direction that helps best separate the labels, while PCA picks the direction that least helps separate them
Fig. 2Separability of random points as the ratio of number of samples to features decreases
Fig. 3When signal features were connected by a linear relationship, PCA outperformed PLS-DA as the number of samples increased
Fig. 4When data points came from a clustered distribution, PLS-DA outperformed PCA as the number of samples increased
Fig. 5Five methods compared under the interval model. PCA-based algorithms have comparable behavior to each other. The same happens with LDA-based algorithms
Fig. 6Performance of the features selected by PLS-DA and PCA for different Community State Types