| Literature DB >> 23591137 |
Abstract
BACKGROUND: Non-negative matrix factorization (NMF) has been introduced as an important method for mining biological data. Though there currently exists packages implemented in R and other programming languages, they either provide only a few optimization algorithms or focus on a specific application field. There does not exist a complete NMF package for the bioinformatics community, and in order to perform various data mining tasks on biological data.Entities:
Year: 2013 PMID: 23591137 PMCID: PMC3736608 DOI: 10.1186/1751-0473-8-10
Source DB: PubMed Journal: Source Code Biol Med ISSN: 1751-0473
Algorithms of NMF variants
| nmfrule | The standard NMF optimized by gradient-descent-based multiplicative rules. |
| nmfnnls | The standard NMF optimized by NNLS active-set algorithm. |
| seminmfrule | Semi-NMF optimized by multiplicative rules. |
| seminmfnnls | Semi-NMF optimized by NNLS. |
| sparsenmfnnls | Sparse-NMF optimized by NNLS. |
| sparsenmfNNQP | Sparse-NMF optimized by NNQP. |
| sparseseminmfnnls | Sparse semi-NMF optimized by NNLS. |
| kernelnmfdecom | Kernel NMF through decomposing the kernel matrix of input data. |
| kernelseminmfrule | Kernel semi-NMF optimized by multiplicative rule. |
| kernelseminmfnnls | Kernel semi-NMF optimized by NNLS. |
| kernelsparseseminmfnnls | Kernel sparse semi-NMF optimized by NNLS. |
| kernelSparseNMFNNQP | Kernel sparse semi-NMF optimized by NNQP. |
| convexnmfrule | Convex-NMF optimized by multiplicative rules. |
| kernelconvexnmf | Kernel convex-NMF optimized by multiplicative rules. |
| orthnmfrule | Orth-NMF optimized by multiplicative rules. |
| wnmfrule | Weighted-NMF optimized by multiplicative rules. |
| sparsenmf2rule | Sparse-NMF on both factors optimized by multiplicative rules. |
| sparsenmf2nnqp | Sparse-NMF on both factors optimized by NNQP. |
| vsmf | Versatile sparse matrix factorization optimized by NNQP and |
| nmf | The omnibus of the above algorithms. |
| computeKernelMatrix | Compute the kernel matrix k(A,B) given a kernel function. |
NMF-based data mining approaches
| NMFCluster | Take the coefficient matrix produced by a NMF algorithm, and output the clustering result. |
| chooseBestk | Search the best number of clusters based on dispersion Coefficients. |
| biCluster | The biclustering method using one of the NMF algorithms. |
| featureExtractionTrain | General interface. Using training data, generate the bases of the NMF feature space. |
| featureExtractionTest | General interface. Map the test/unknown data into the feature space. |
| featureFilterNMF | On training data, select features by various NMFs. |
| featSel | Feature selection methods. |
| nnlsClassifier | The NNLS classifier. |
| perform | Evaluate the classifier performance. |
| changeClassLabels01 | Change the class labels to be in {0,1,2,⋯, |
| gridSearchUniverse | A framework to do line or grid search. |
| classificationTrain | Train a classifier, many classifiers are included. |
| classificationPredict | Predict the class labels of unknown samples via the model learned by classificationTrain. |
| multiClassifiers | Run multiple classifiers on the same training data. |
| cvExperiment | Conduct experiment of k-fold cross-validation on a data set. |
| significantAcc | Check if the given data size can obtain significant accuracy. |
| learnCurve | Fit the learning curve. |
| FriedmanTest | Friedman test with post-hoc Nemenyi test to compare multiple classifiers on multiple data sets. |
| plotNemenyiTest | Plot the CD diagram of Nemenyi test. |
| NMFHeatMap | Draw and save the heat maps of NMF clustering. |
| NMFBicHeatMap | Draw and save the heat maps of NMF biclustering. |
| plotBarError | Plot Bars with STD. |
| writeGeneList | Write the gene list into a.txt file. |
| normmean0std1 | Normalization to have mean 0 and STD 1. |
| sparsity | Calculate the sparsity of a matrix. |
| MAT2DAT | Write a data set from MATLAB into.dat format in order to be readable by other languages. |
Figure 1Heat map of NMF biclustering result. Left: the gene expression data where each column corresponds to a sample. Center: the basis matrix. Right: the coefficient matrix.
Gene set enrichment analysis using Onto-Express for the factor specific genes identified by NMF
| reproduction (5) | 0 | response to stimulus (15) | 0.035 | regulation of bio. proc. (226) | 0.009 |
| metabolic process (41) | 0 | biological regulation (14) | 0.048 | multi-organism proc. (39) | 0.005 |
| cellular process (58) | 0 | | | biological regulation (237) | 0.026 |
| death (5) | 0 | | | | |
| developmental process (19) | 0 | | | | |
| regulation of biological process (19) | 0 | ||||
Figure 2Heat map of NMF clustering result on yeast metabolic cycle time-series data. Left: the gene expression data where each column corresponds to a sample. Center: the basis matrix. Right: the coefficient matrix.
Figure 3Biological processes discovered by NMF on yeast metabolic cycle time-series data.
Figure 4Biological processes discovered by NMF on breast cancer time-series data.
Figure 5Mean accuracy and standard deviation results of NMF-based feature extraction on SRBCT data.
Figure 6The mean accuracy results of NNLS classifier for different amount of noise on SRBCT data.
Figure 7The mean accuracy results of NNLS classifier for different missing value rates on SRBCT data.
Figure 8The fitted learning curves of NNLS and SVM classifiers on SRBCT data.
Figure 9Nemenyi test comparing 8 classifiers over 13 high dimensional biological data ( ).