| Literature DB >> 23815087 |
Jin-Xing Liu1, Yu-Tian Wang, Chun-Hou Zheng, Wen Sha, Jian-Xun Mi, Yong Xu.
Abstract
How to identify a set of genes that are relevant to a key biological process is an important issue in current molecular biology. In this paper, we propose a novel method to discover differentially expressed genes based on robust principal component analysis (RPCA). In our method, we treat the differentially and non-differentially expressed genes as perturbation signals S and low-rank matrix A, respectively. Perturbation signals S can be recovered from the gene expression data by using RPCA. To discover the differentially expressed genes associated with special biological progresses or functions, the scheme is given as follows. Firstly, the matrix D of expression data is decomposed into two adding matrices A and S by using RPCA. Secondly, the differentially expressed genes are identified based on matrix S. Finally, the differentially expressed genes are evaluated by the tools based on Gene Ontology. A larger number of experiments on hypothetical and real gene expression data are also provided and the experimental results show that our method is efficient and effective.Entities:
Mesh:
Year: 2013 PMID: 23815087 PMCID: PMC3654929 DOI: 10.1186/1471-2105-14-S8-S3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The RPCA model of microarray data. The white and yellow blocks denote zero and near-zero in this figure. Red and blue blocks denote the perturbation signals.
The recognition accuracy with different
| n | 500 | 1000 | 2000 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| rank/n | 0.05 | 0.05 | 0.10 | 0.10 | 0.05 | 0.05 | 0.10 | 0.10 | 0.05 | 0.05 | 0.10 | 0.10 |
| 0.05 | 0.10 | 0.05 | 0.10 | 0.05 | 0.10 | 0.05 | 0.10 | 0.05 | 0.10 | 0.05 | 0.10 | |
| 0.1 | 1.00 | 0.30 | 0.96 | 0.02 | 1.00 | 0.64 | 1.00 | 0.07 | 1.00 | 0.71 | 1.00 | 0.08 |
| 0.2 | 1.00 | 1.00 | 1.00 | 0.92 | 1.00 | 1.00 | 1.00 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.3 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.4 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.5 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.6 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.7 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.8 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.9 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 1.0 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
The recognition accuracy with and
| n | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | |
| 0.1 | 1.00 | 0.30 | 0.96 | 0.02 | 1.00 | 0.64 | 1.00 | 0.07 | 1.00 | 0.71 |
| 0.2 | 1.00 | 1.00 | 1.00 | 0.92 | 1.00 | 1.00 | 1.00 | 0.99 | 1.00 | 1.00 |
| 0.3 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.4 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.5 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.6 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.7 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.8 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.9 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 1.0 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
The recognition accuracy with and
| n | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | |
| 0.1 | 0.01 | 0.02 | 0.07 | 0.15 | 0.24 | 0.36 | 0.43 | 0.51 | 0.59 | 0.66 |
| 0.2 | 0.24 | 0.84 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.3 | 0.50 | 0.95 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.4 | 0.61 | 0.97 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.5 | 0.62 | 0.96 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.6 | 0.64 | 0.94 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.7 | 0.64 | 0.93 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.8 | 0.65 | 0.91 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.9 | 0.66 | 0.89 | 0.98 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 1.0 | 0.67 | 0.86 | 0.97 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
The recognition accuracy with and
| n | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | |
| 0.1 | 0.00 | 0.06 | 0.50 | 0.92 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.2 | 0.06 | 0.61 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.3 | 0.15 | 0.77 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.4 | 0.27 | 0.74 | 0.98 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.5 | 0.40 | 0.67 | 0.96 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.6 | 0.50 | 0.63 | 0.93 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.7 | 0.59 | 0.60 | 0.88 | 0.98 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.8 | 0.66 | 0.59 | 0.82 | 0.97 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.9 | 0.71 | 0.61 | 0.76 | 0.94 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 1.0 | 0.75 | 0.65 | 0.72 | 0.90 | 0.98 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
The recognition accuracy with and
| n | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | |
| 0.1 | 0.01 | 0.01 | 0.00 | 0.01 | 0.01 | 0.01 | 0.02 | 0.04 | 0.07 | 0.09 |
| 0.2 | 0.22 | 0.16 | 0.50 | 0.89 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.3 | 0.51 | 0.43 | 0.89 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.4 | 0.62 | 0.56 | 0.93 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.5 | 0.64 | 0.59 | 0.92 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.6 | 0.64 | 0.58 | 0.88 | 0.98 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.7 | 0.65 | 0.58 | 0.83 | 0.96 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.8 | 0.65 | 0.59 | 0.79 | 0.94 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0.9 | 0.67 | 0.61 | 0.73 | 0.91 | 0.98 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 1.0 | 0.68 | 0.65 | 0.70 | 0.86 | 0.96 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 |
Figure 2The recognition accuracy of matrix . s1 denotes the recognition accuracy series with and . s2 denotes the recognition accuracy series with and . s3 denotes the recognition accuracy series with and . s4 denotes the recognition accuracy series with and .
The sample number of each stress type in the raw data
| Stress Type | cold | drought | salt | UV-B | heat | osmotic | control |
|---|---|---|---|---|---|---|---|
| Number of Samples | 6 | 7 | 6 | 7 | 8 | 6 | 8 |
The values of and on different data set
| PMD | SPCA | PMD | SPCA | |
|---|---|---|---|---|
| drought | 0.0928 | 0.4224 | 0.0999 | 0.4065 |
| salt | 0.0924 | 0.4920 | 0.1057 | 0.5261 |
| UV-B | 0.1036 | 0.4505 | 0.0966 | 0.4329 |
| cold | 0.1026 | 0.4660 | 0.0983 | 0.4726 |
| heat | 0.0765 | 0.3770 | 0.0931 | 0.3710 |
| osmotic | 0.1049 | 0.5139 | 0.0946 | 0.5338 |
Response to abiotic stimulus (GO:0009628)
| P-value | Sample frequency | P-value | Sample frequency | P-value | Sample frequency | ||
|---|---|---|---|---|---|---|---|
| drought | s | 3.91E-34 | 107/500 (21.4%) | 7.5E-21 | 87/500 (17.4%) | 1.09E-45 | |
| drought | r | 1.78E-10 | 68/500 (13.6%) | 4.14E-08 | 63/500 (12.6%) | 1.03E-27 | |
| salt | s | 9.93E-39 | 113/500 (22.6%) | 9.83E-33 | 105/500 (21.0%) | 1.35E-55 | |
| salt | r | 1.36E-15 | 78/500 (15.6%) | 6.18E-12 | 71/500 (14.2%) | 1.65E-22 | |
| UV-B | s | 1.76E-13 | 74/500 (14.8%) | 7.84E-23 | 90/500 (18.0%) | 5.9E-41 | |
| UV-B | r | 5.3E-10 | 67/500 (13.4%) | 8.00 E-4 | 52/500 (10.4%) | 4.73E-29 | |
| cold | s | 5.82E-35 | 106/500 (21.6%) | 1.17E-19 | 85/500 (17.0%) | 2.13E-46 | |
| cold | r | 2.74E-23 | 4.1E-19 | 84/500 (16.8%) | 4.02E-23 | ||
| heat | s | 1.44E-24 | 93/500 (18.6%) | 4.64E-22 | 89/500 (17.8%) | 7.46E-55 | |
| heat | r | 1.41E-15 | 78/500 (15.6%) | 1.35E-08 | 64/500 (12.8%) | 1.07E-34 | |
| osmotic | s | 6.55E-38 | 112/500 (22.4%) | 2.02E-18 | 83/500 (16.6%) | 6.83E-54 | |
| osmotic | r | 1.4E-14 | 76/500 (15.2%) | 2.87E-17 | 81/500 (16.2%) | 9.98E-35 | |
In this table, 's' denotes the shoot samples; 'r' denotes the root samples.
Figure 3The sample frequency of response to abiotic stimulus.
Characteristic terms selected from GO by algorithms
| Stress type | GO Terms | Background frequency | Sample frequency | |||
|---|---|---|---|---|---|---|
| drought | s | GO:0009414 response to water deprivation | 207/29887 (0.7%) | 23/500 (4.6%) | 34/500 (6.8%) | |
| drought | r | GO:0009415 response to water deprivation | 207/29887 (0.7%) | 26/500 (5.2%) | 24/500 (4.8%) | |
| salt | s | GO:0009651 response to salt stress | 395/29887 (1.3%) | 41/500 (8.2%) | 28/500 (5.6%) | |
| salt | r | GO:0009651 response to salt stress | 395/29887 (1.3%) | 22/500 (4.4%) | 31/500 (6.2%) | |
| UV-B | s | GO:0009416Response to light stimulus | 557/29887 (1.9%) | 23/500 (4.6%) | 30/500 (6.0%) | |
| UV-B | r | GO:0009416Response to light stimulus | 557/29887 (1.9%) | 24/500 (4.8%) | none | |
| cold | s | GO:0009409 response to cold | 276/29887 (0.9%) | 44/500 (8.8%) | 34/500 (6.8%) | |
| cold | r | GO:0009410 response to cold | 276/29887 (0.9%) | 33/500 (6.6%) | 38/500 (7.6%) | |
| heat | s | GO:0009408 response to heat | 140/29887 (0.5%) | 45/500 (9.0%) | 30/500 (6.0%) | |
| heat | r | GO:0009409 response to heat | 140/29887 (0.5%) | 43/500 (8.6%) | 28/500 (5.6%) | |
| osmotic | s | GO:0006970 response to osmotic stress | 474/29887 (1.6%) | 29/500 (5.8%) | ||
| osmotic | r | GO:0006970 response to osmotic stress | 474/29887 (1.6%) | 39/500 (7.8%) | 27/500 (5.4%) | |
In this table, 's' denotes the shoot samples; 'r' denotes the root samples; 'none' denotes that the algorithm cannot give the GO terms.
Characteristic terms selected from GO on colon data
| Accession No. | GO:0050896 | GO:0002376 |
| Background frequency | 32294/155706 (20.7%) | 7011/155706 (4.5%) |
| P-value(RPCA) | 1.76E-10 | 5.74E-09 |
| Sample frequency (RPCA) | 38/57 (66.7%) | 19/57 (33.3%) |
| P-value(SPCA) | 8.71E-06 | 2.95E-04 |
| Sample frequency (SPCA) | 32/57 (56.1%) | 14/57 (24.6%) |
| P-value(PMD) | 7.93E-04 | 8.27E-01 |
| Sample frequency (PMD) | 27/51 (52.9%) | 9/51 (17.6%) |
The top 30 genes of colon data selected by RPCA
| M27190 | gene | Homo sapiens secretary pancreatic stone protein (PSP-S) mRNA, complete cds. |
| R89823 | 3' UTR | INORGANIC PYROPHOSPHATASE (Bos taurus) |
| M87789 | gene | IG GAMMA-1 CHAIN C REGION (HUMAN). |
| T48904 | 3' UTR | HEAT SHOCK 27 KD PROTEIN (HUMAN). |
| M26383 | gene | Human monocyte-derived neutrophil-activating protein (MONAP) mRNA, complete cds. |
| J00231 | gene | Human Ig gamma3 heavy chain disease OMM protein mRNA. |
| X02761 | gene | Human mRNA for fibronectin (FN precursor). |
| R80612 | 3' UTR | PHOSPHOLIPASE A2, MEMBRANE ASSOCIATED PRECURSOR (HUMAN). |
| M31994 | gene | Human cytosolic aldehyde dehydrogenase (ALDH1) gene, exon 13. |
| T47377 | 3' UTR | S-100P PROTEIN (HUMAN). |
| X02492 | gene | INTERFERON-INDUCED PROTEIN 6-16 PRECURSOR (HUMAN); contains L1 repetitive element. |
| M94132 | gene | Human mucin 2 (MUC2) mRNA sequence. |
| X67325 | gene | H.sapiens p27 mRNA. |
| D28137 | gene | Human mRNA for BST-2, complete cds. |
| L05144 | gene | PHOSPHOENOLPYRUVATE CARBOXYKINASE, CYTOSOLIC (HUMAN); contains Alu repetitive element; contains element PTR5 repetitive element. |
| X02874 | gene | Human mRNA for (2'-5') oligo A synthetase E (1,6 kb RNA). |
| T55117 | 3' UTR | ALPHA-1-ANTITRYPSIN PRECURSOR (HUMAN). |
| M19045 | gene | Human lysozyme mRNA, complete cds. |
| Y00711 | gene | L-LACTATE DEHYDROGENASE H CHAIN (HUMAN);. |
| X60489 | gene | Human mRNA for elongation factor-1-beta. |
| T57780 | 3' UTR | IG LAMBDA CHAIN C REGIONS (HUMAN). |
| T60778 | 3' UTR | MATRIX GLA-PROTEIN PRECURSOR (Rattus norvegicus). |
| H58397 | 3' UTR | TRANS-1, 2-DIHYDROBENZENE-1, 2-DIOL DEHYDROGENASE (HUMAN). |
| L08044 | gene | Human intestinal trefoil factor mRNA, complete cds. |
| M18216 | gene | Human nonspecific cross reacting antigen mRNA, complete cds. |
| K03474 | gene | Human Mullerian inhibiting substance gene, complete cds. |
| L33930 | gene | Homo sapiens CD24 signal transducer mRNA, complete cds and 3' region. |
| T48014 | 3' UTR | HEMOGLOBIN ALPHA CHAIN (HUMAN). |
| H73908 | 3' UTR | METALLOTHIONEIN-IA (Bos taurus) |
| R70030 | 3' UTR | IG MU CHAIN C REGION (HUMAN). |
Pathway analysis of the top 100 genes selected by RPCA on colon data
| rank | Go annotation | Q-value | Genes in network | Genes in genome |
|---|---|---|---|---|
| 1 | cytokine-mediated signalling pathway | 2.27E-20 | 21 | 215 |
| 2 | cellular response to cytokine stimulus | 1.70E-19 | 21 | 244 |
| 3 | response to cytokine stimulus | 2.62E-18 | 21 | 283 |
| 4 | type I interferon-mediated signalling pathway | 1.61E-17 | 14 | 71 |
| 5 | cellular response to type I interferon | 1.61E-17 | 14 | 71 |
| 6 | response to type I interferon | 1.67E-17 | 14 | 72 |
| 7 | interferon-gamma-mediated signalling pathway | 2.60E-08 | 9 | 77 |
| 8 | cellular response to interferon-gamma | 3.64E-08 | 9 | 81 |
| 9 | response to interferon-gamma | 1.04E-07 | 9 | 92 |
| 10 | response to other organism | 3.69E-05 | 10 | 243 |