| Literature DB >> 28470011 |
Chun-Mei Feng1, Ying-Lian Gao2, Jin-Xing Liu1, Juan Wang1, Dong-Qin Wang1, Chang-Gang Wen1.
Abstract
Principal Component Analysis (PCA) as a tool for dimensionality reduction is widely used in many areas. In the area of bioinformatics, each involved variable corresponds to a specific gene. In order to improve the robustness of PCA-based method, this paper proposes a novel graph-Laplacian PCA algorithm by adopting L1/2 constraint (L1/2 gLPCA) on error function for feature (gene) extraction. The error function based on L1/2-norm helps to reduce the influence of outliers and noise. Augmented Lagrange Multipliers (ALM) method is applied to solve the subproblem. This method gets better results in feature extraction than other state-of-the-art PCA-based methods. Extensive experimental results on simulation data and gene expression data sets demonstrate that our method can get higher identification accuracies than others.Entities:
Mesh:
Year: 2017 PMID: 28470011 PMCID: PMC5392409 DOI: 10.1155/2017/5073427
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Algorithm 1Procedure of L1/2 gLPCA.
Figure 1The accuracy of different methods on simulation data with different parameters.
Figure 2The accuracy of different methods on simulation data with different numbers of samples.
The average accuracy and variance of different methods on simulation data with different parameters.
| Methods |
| RgLPCA | gLPCA |
|
| PCA | LE |
|---|---|---|---|---|---|---|---|
| Average accuracy (%) | 66.12 | 65.47 | 63.53 | 44.43 | 48.43 | 59.00 | 65.10 |
| Variance | 1.48 | 1.62 | 1.76 | 23.60 | 20.30 | 1.61 | 1.97 |
The average accuracy and variance of different methods on simulation data with different numbers of samples.
| Methods |
| RgLPCA | gLPCA |
|
| PCA | LE |
|---|---|---|---|---|---|---|---|
| Average accuracy (%) | 70.25 | 68.25 | 67.90 | 67.30 | 69.20 | 58.62 | 69.60 |
| Variance | 2.58 | 3.84 | 4.41 | 3.52 | 2.23 | 1.79 | 2.50 |
Enrichment analysis of the top 500 genes in the ALLAML data corresponding to different methods.
| ID | Name |
| RgLPCA | gLPCA |
|
| PCA | LE | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| Hit |
| Hit |
| Hit |
| Hit |
| Hit |
| Hit |
| Hit | ||
| GO:0006955 | Immune response | 1.34 | 93 | 2.51 | 91 | 1.20 | 91 | 2.45 | 87 | 5.14 | 89 | 4.05 | 91 | 1.98 | 91 |
| GO:0002684 | Positive regulation of immune system process | 2.44 | 67 | 2.17 | 63 | 1.24 | 64 | 3.60 | 66 | 1.56 | 66 | 8.98 | 65 | 3.45 | 66 |
| GO:0098552 | Side of membrane | 3.80 | 46 | 5.19 | 45 | 2.70 | 43 | 2.23 | 41 | 7.25 | 42 | 2.01 | 44 | 4.24 | 47 |
|
| |||||||||||||||
| GO:0009897 | External side of plasma membrane | 1.83 | 30 | 6.34 | 29 | 9.51 | 26 | 1.14 | 26 | 1.41 | 25 | 1.31 | 29 | 1.83 | 26 |
| GO:0005615 | Extracellular space | 2.01 | 63 | 8.37 | 60 | 2.39 | 58 | 6.12 | 61 | 3.52 | 57 | 2.27 | 61 | 4.74 | 61 |
| GO:0005764 | Lysosome | 3.49 | 38 | 7.43 | 37 | 5.46 | 34 | 1.20 | 35 | 9.22 | 30 | 1.08 | 36 | 3.49 | 37 |
|
| |||||||||||||||
| GO:0009986 | Cell surface | 3.58 | 48 | 4.82 | 45 | 4.68 | 42 | 6.13 | 42 | 5.58 | 41 | 6.58 | 46 | 3.58 | 46 |
| GO:0042277 | Peptide binding | 5.03 | 25 | 5.92 | 24 | 2.85 | 22 | 3.33 | 22 | 7.54 | 18 | 3.09 | 23 | 1.80 | 21 |
| GO:0033218 | Amide binding | 7.37 | 26 | 4.34 | 24 | 3.44 | 23 | 4.04 | 23 | 7.36 | 19 | 3.95 | 24 | 2.04 | 22 |
Figure 3The pathway of hematopoietic cell lineage.
Enrichment analysis of the top 500 genes in the PAAD-GE data corresponding to different methods.
| ID | Name |
| RgLPCA | gLPCA |
|
| PCA | LE | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| Hit |
| Hit |
| Hit |
| Hit |
| Hit |
| Hit |
| Hit | ||
| GO:0005615 | Extracellular space | 3.20 | 196 | 3.56 | 183 | 2.18 | 173 | 2.742 | 160 | 7.82 | 161 | 1.44 | 157 | 3.20 | 191 |
| GO:0006614 | SRP-dependent cotranslational protein targeting to membrane | 2.79 | 67 | 6.82 | 56 | 1.37 | 64 | 3.45 | 48 | 8.17 | 51 | 7.45 | 63 | 2.76 | 51 |
| GO:0070972 | Protein localization to endoplasmic reticulum | 1.01 | 73 | 2.42 | 69 | 6.37 | 68 | 5.31 | 51 | 4.63 | 54 | 2.88 | 71 | 4.70 | 53 |
|
| |||||||||||||||
| GO:0006613 | Cotranslational protein targeting to membrane | 1.86 | 67 | 3.48 | 65 | 7.58 | 64 | 3.27 | 48 | 1.19 | 51 | 2.04 | 66 | 4.04 | 51 |
| GO:0045047 | Protein targeting to ER | 5.01 | 67 | 5.19 | 65 | 2.00 | 64 | 6.04 | 48 | 2.33 | 51 | 5.80 | 67 | 7.90 | 51 |
| GO:0022626 | Cytosolic ribosome | 1.34 | 68 | 2.13 | 64 | 1.44 | 62 | 8.30 | 47 | 8.45 | 50 | 6.34 | 66 | 4.01 | 52 |
|
| |||||||||||||||
| GO:0072599 | Establishment of protein localization to endoplasmic reticulum | 2.77 | 67 | 8.15 | 66 | 4.44 | 64 | 6.44 | 48 | 3.09 | 51 | 3.20 | 67 | 1.05 | 51 |
| GO:0005198 | Structural molecule activity | 1.82 | 126 | 2.46 | 124 | 6.14 | 121 | 3.62 | 110 | 5.16 | 113 | 3.32 | 124 | 1.03 | 113 |
| GO:0044391 | Ribosomal subunit | 5.14 | 69 | 5.18 | 65 | 3.22 | 63 | 2.86 | 49 | 1.42 | 52 | 1.58 | 68 | 3.70 | 53 |
Figure 4The pathway of focal adhesion.
The function of top 7 extraction genes.
| Gene ID | Gene name | Related GO annotations | Related diseases | Paralogous genes |
|---|---|---|---|---|
| 5644 | PRSS1 | Serine-type endopeptidase activity | Trypsinogen deficiency and prss1-related hereditary pancreatitis | KLK12 |
| 5406 | PNLIP | Carboxylic ester hydrolase activity and triglyceride lipase activity | Pancreatic colipase deficiency and pancreatic lipase deficiency | LPL |
| 1357 | CPA1 | Metallocarboxypeptidase activity and exopeptidase activity | Borna disease and pancreatitis, hereditary | CPA3 |
| 1360 | CPB1 | Metallocarboxypeptidase activity and carboxypeptidase activity | Acute pancreatitis and tricuspid valve insufficiency | CPA3 |
| 63036 | CELA2A | Serine-type endopeptidase activity and serine hydrolase activity | Pancreatitis, hereditary | CELA2B |
| 5967 | REG1A | Carbohydrate binding and growth factor activity | Acinar cell carcinoma and tropical calcific pancreatitis | REG3G |
| 1056 | CEL | Hydrolase activity and carboxylic ester hydrolase activity | Maturity-onset diabetes of the young, Type VIII and maturity-onset diabetes of the young | CES2 |
The Acc and highest relevance score of these methods.
| Dataset |
| RgLPCA | gLPCA |
|
| PCA | LE | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc (%) | Relevance score | Acc (%) | Relevance score | Acc (%) | Relevance score | Acc (%) | Relevance score | Acc (%) | Relevance score | Acc (%) | Relevance score | Acc (%) | Relevance score | |
| AMLALL |
|
| 49.88 | 46.11 | 48.67 | 46.11 | 40.00 | 38.15 | 52.00 | 46.11 | 49.00 | 46.11 | 49.60 | 46.11 |
| PAAD-GE |
|
| 60.51 | 61.01 | 59.40 | 61.01 | 43.80 | 54.77 | 47.20 | 54.77 | 57.20 | 82.20 | 61.40 | 82.20 |