| Literature DB >> 22808018 |
Jin-Xing Liu1, Yong Xu, Chun-Hou Zheng, Yi Wang, Jing-Yu Yang.
Abstract
Conventional gene selection methods based on principal component analysis (PCA) use only the first principal component (PC) of PCA or sparse PCA to select characteristic genes. These methods indeed assume that the first PC plays a dominant role in gene selection. However, in a number of cases this assumption is not satisfied, so the conventional PCA-based methods usually provide poor selection results. In order to improve the performance of the PCA-based gene selection method, we put forward the gene selection method via weighting PCs by singular values (WPCS). Because different PCs have different importance, the singular values are exploited as the weights to represent the influence on gene selection of different PCs. The ROC curves and AUC statistics on artificial data show that our method outperforms the state-of-the-art methods. Moreover, experimental results on real gene expression data sets show that our method can extract more characteristic genes in response to abiotic stresses than conventional gene selection methods.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22808018 PMCID: PMC3393749 DOI: 10.1371/journal.pone.0038873
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1ROC curves for artificial data. (SNR denotes the signal-to-noise ratio).
AUC statistics for artificial data.
| SNR = 0.5 | SNR = 1 | SNR = 1.5 | SNR = 2 | SNR = 2.5 | SNR = 3 | |
| SPCA-1 | 0.6985000 | 0.8186722 | 0.8517611 | 0.8738500 | 0.8733944 | 0.8665222 |
| SPCA-2 | 0.7131000 | 0.8888778 | 0.9469222 | 0.9668333 | 0.9730667 | 0.9800500 |
| WPCS | 0.7615000 | 0.9579889 | 0.9918333 | 0.9971667 | 0.9988222 | 0.9989444 |
| PLS | 0.6282000 | 0.7269556 | 0.7812667 | 0.8061889 | 0.8261889 | 0.8506667 |
Figure 2The Response to stimulus (GO: 0050896).
Figure 3The Response to stress (GO: 0006950).
Response to stimulus (GO: 0050896) in shoot samples.
| Stress | SPCA-1 | SPCA-2 | WPCS | PLS | ||||
| Type | SF | PV | SF | PV | SF | PV | SF | PV |
| Cold | 120/300 | 2.39E-21 | 135/300 45.0% | 2.19E-30 |
| 2.33E-32 | 136/300 45.3% | 5.19E-31 |
| Drought | 126/300 42.0% | 8.50E-25 | 151/300 50.3% | 1.62E-41 |
| 1.17E-47 | 147/300 49.0% | 2.36E-38 |
| Salt | 125/300 41.7% | 3.27E-24 | 132/300 44.0% | 1.19E-28 |
| 8.34E-39 | 115/300 38.5% | 8.57E-19 |
| UV-B | 145/300 48.3% | 6.44E-37 | 146/300 48.7% | 1.31E-37 |
| 7.51E-47 | 146/300 48.7% | 1.20E-37 |
| Heat | 109/300 36.3% | 2.05E-15 | 121/300 40.3% | 7.12E-22 |
| 2.89E-35 | 105/300 35.0% | 1.24E-13 |
| Osmotic | 110/30036.7% | 6.18E-16 | 123/300 41.0% | 7.06E-23 |
| 2.19E-38 | 116/300 38.7% | 4.96E-19 |
Note: The response to stimulus on characteristic genes are shown, whose background frequency in TAIR set is 4570/29887 (15.3%), where 4570/29887 denotes having 4570 genes to respond to stimulus in whole 29887 genes set.
SF: sample frequency, PV: P-value.
In the table, the sample frequency, e.g. 120/300, denotes the method select 300 genes, in which there are 120 genes responding to stimulus.
Response to stress (GO: 0006950) in root samples.
| Stress | SPCA-1 | SPCA-2 | WPCS | PLS | ||||
| Type | SF | PV | SF | PV | SF | PV | SF | PV |
| Cold | 84/300 28.0% | 7.21E-23 | 81/300 27.0% | 9.01E-21 |
| 6.51E-33 | 79/300 26.3% | 1.48E-19 |
| Drought | 100/300 33.3% | 1.68E-34 | 99/300 33.0% | 1.03E-33 |
| 1.58E-34 | 79/300 26.3% | 1.34E-19 |
| Salt | 92/300 30.7% | 1.79E-28 | 99/300 33.0% | 9.23E-34 |
| 6.50E-37 | 91/300 30.3% | 9.12E-28 |
| UV-B | 50/300 16.7% | 1.30E-04 | 66/300 22.0% | 5.55E-12 |
| 1.33E-25 | 55/300 18.3% | 1.09E-06 |
| Heat | 71/300 23.7% | 1.11E-14 | 78/300 26.0% | 4.21E-19 |
| 2.14E-26 | 75/300 25.0% | 3.20E-17 |
| Osmotic | 87/300 29.0% | 6.81E-25 | 96/300 32.0% | 1.89E-31 |
| 2.52E-35 | 81/300 27.0% | 6.93E-21 |
Response to stimulus (GO: 0050896) in root samples.
| Stress | SPCA-1 | SPCA-2 | WPCS | PLS | ||||
| Type | SF | PV | SF | PV | SF | PV | SF | PV |
| Cold | 116/300 38.7% | 4.79E-19 | 121/300 40.3% | 1.12E-21 |
| 1.78E-31 | 120/300 40.0% | 3.46E-21 |
| Drought | 138/300 46.0% | 3.62E-32 |
| 1.48E-35 | 137/300 45.7% | 1.55E-31 | 119/300 39.7% | 1.12E-20 |
| Salt | 132/300 44.0% | 2.52E-28 | 136/300 45.3% | 6.80E-31 |
| 2.06E-38 | 130/300 43.3% | 4.23E-27 |
| UV-B | 100/300 33.3% | 2.60E-11 | 116/300 38.7% | 4.79E-19 |
| 6.67E-35 | 105/300 35.0% | 1.34E-13 |
| Heat | 87/300 29.0% | 3.30E-06 | 99/300 33.0% | 4.75E-11 |
| 6.93E-22 | 93/300 31.0% | 1.41E-08 |
| Osmotic | 109/300 36.3% | 1.92E-15 | 114/300 38.0% | 5.57E-18 |
| 4.43E-27 | 102/300 34.0% | 3.29E-12 |
Response to stress (GO: 0006950) in shoot samples.
| Stress | SPCA-1 | SPCA-2 | WPCS | PLS | ||||
| Type | SF | PV | SF | PV | SF | PV | SF | PV |
| Cold | 87/300 29.0% | 5.19E-25 | 96/300 32.0% | 1.52E-31 |
| 2.61E-32 | 84/300 28.0% | 6.72E-23 |
| Drought | 76/300 25.3% | 9.02E-18 | 103/300 34.3% | 5.07E-37 | 110/300 36.7% | 7.81E-43 |
| 1.68E-43 |
| Salt | 84/300 28.0% | 6.06E-23 | 94/300 31.3% | 3.57E-30 |
| 1.15E-31 | 79/300 26.3% | 8.58E-20 |
| UV-B | 95/300 31.7% | 1.28E-30 | 93/300 31.0% | 4.13E-29 |
| 8.09E-38 | 99/300 33.0% | 1.07E-33 |
| Heat | 88/300 29.3% | 1.46E-25 | 89/300 29.7% | 2.20E-26 |
| 5.61E-38 | 85/300 28.3% | 1.31E-23 |
| Osmotic | 81/300 27.0% | 7.48E-21 | 90/300 30.0% | 5.38E-27 |
| 4.37E-36 | 88/300 29.3% | 1.32E-25 |
Note: The response to stress (GO: 0006950) obtained by GO Term Enrichment Analysis are shown, whose background frequency in TAIR set is 2351/29887 (7.9%), where 2351/29887 denotes having 2351 genes to respond to stress in whole 29887 genes set.
The numbers of response to cold (GO: 0009410) in shoot samples.
| Method | SPCA-1 | SPCA-2 | WPCS | PLS |
| Number and percent | 38 genes,12.7% | 45 genes,15.0% | 48 genes,16.0% | 33 genes,11.0% |
| P-value | 1.45E-27 | 2.43E-36 | 2.73E-40 | 1.18E-21 |
Different genes of response to cold (GO: 0009410) in shoot samples.
| Gene No. | Function of Gene |
| At1g21910 | Participates in plant developmental processes as well as biotic and/or abiotic stress signaling. |
| At1g22770 | Regulates several developmental processes, such as circadian clock, carbohydrate metabolism, and cold stress response. |
| At1g29395 | Expression is induced by short-term cold-treatment, water deprivation, and abscisic acid (ABA) treatment. |
| At2g19450 | Role in senescence and seed development induced by cold-stress. |
| At2g25930 | Temperature stress reduced the pyk20 transcript level. |
| At2g28900 | Predominantly expressed in leaves and is also inducible by cold treatment. |
| At2g33380 | Plays a role as a peroxygenase involved in oxylipin metabolism during biotic and abiotic stress. |
| At2g38470 | Involved in response to various abiotic stresses |
| At2g47180 | Increases tolerance to chilling stress |
| At3g05880 | Induced by low temperatures, dehydration and salt stress. |
| At3g48360 | Mediates multiple responses to nutrients, stresses, and hormones. |
| At3g53990 | Low temperature and salt responsive protein family. |
| At4g30650 | Low temperature and salt expression protein homologous. |
| At4g30660 | Putative low temperature and salt responsive protein. |
| At4g37610 | Under cold stress indicates increased expression. |
| At5g52300 | Induced by low temperature, exogenous abscisic acid (ABA) and drought. |
| At5g57560 | Controlling tolerance to cold stress |
The numbers of response to light stimulus (GO: 0009416) in root samples.
| Method | SPCA-1 | SPCA-2 | WPCS | PLS |
| Number and percent | 17 genes, 5.7% | 20 genes, 6.7% | 24 genes, 8.0% | 17 genes, 5.7% |
| P-value | 1.74E-02 | 2.90E-04 | 7.42E-07 | 1.55E-02 |
Different genes of response to light stimulus (GO:0009416) in root samples.
| Gene No. | Function of Gene |
| At2g29500 | HSP20-like chaperones superfamily protein. |
| At3g54890 | Encodes a component of the light harvesting complex associated with photosystem I. |
| At3g55120 | Catalyzes the conversion of chalcones into flavanones. |
| At5g02810 | Acts as transcriptional repressor of CCA1 and LHY. |
| At5g12030 | Encodes a cytosolic small heat shock protein with chaperone activity that is induced by heat and high light intensity stress. |
| At5g15960 | stress-responsive protein (KIN1). |
| At5g24470 | Encodes a pseudo-response regulator whose mutation affects various circadian-associated biological events such as red light sensitivity of seedlings during early photomorphogenesis. |
| At5g45340 | abscisic acid 8′-hydroxylase 3. |
The number of each stress type in the raw data.
| Stress Type | cold | drought | salt | UV-B | heat | osmotic | control |
| Number | 6 | 7 | 6 | 7 | 8 | 6 | 8 |
Figure 4The graphical depiction of SPCA of a matrix A with factor scores and PCs .
In this figure, with factor scores and PCs . is the row vector of PCs the j-th gene, which transforms the original data vector into factor scores . Correspondingly, is the column vector of PCs , which transforms the original data vector into factor scores .
Figure 5Workflow diagram of WPCS.