| Literature DB >> 27585655 |
Stéphane Chrétien1, Christophe Guyeux2, Bastien Conesa3, Régis Delage-Mouroux4, Michèle Jouvenot4, Philippe Huetz5, Françoise Descôtes6.
Abstract
BACKGROUND: Non-Negative Matrix factorization has become an essential tool for feature extraction in a wide spectrum of applications. In the present work, our objective is to extend the applicability of the method to the case of missing and/or corrupted data due to outliers.Entities:
Keywords: Feature extraction; Gene expression analysis; Non-negative matrix factorization; Outliers and missing data
Mesh:
Year: 2016 PMID: 27585655 PMCID: PMC5009666 DOI: 10.1186/s12859-016-1120-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Evolution of the error as k goes from 1 to 100 on a random example
Fig. 2Sparse PCA fails in finding relevant number of features
Fig. 3The average squared prediction error on the artificially declared as missing entries as a function of −1−2/5∗ log(λ/50)
Fig. 4The factorization and convergence curves. The first subplot is S after convergence. The second subplot is V . The third subplot, shows the distance between two successive iterates of Λ. The fourth subplot shows the relative error between M and its NMF as a function of iteration number
Fig. 5Cluster index for each group of patients. Subplot 1 corresponds to pTa, subplot 2 to pT1a, subplot 3 to pT1b, and subplot 4 to > pT1
Fig. 6Determination of the optimal number of clusters in denoized data: number of clusters for easting values and (log of) Bayesian Information Criterion (BIC) for northing ones
Fig. 7PCA on raw data, colorized according to their cluster provided by the GMM