| Literature DB >> 26201006 |
Jian Liu1, Jin-Xing Liu2, Ying-Lian Gao3, Xiang-Zhen Kong4, Xue-Song Wang5, Dong Wang4.
Abstract
In current molecular biology, it becomes more and more important to identify differentially expressed genes closely correlated with a key biological process from gene expression data. In this paper, based on the Schatten p-norm and Lp-norm, a novel p-norm robust feature extraction method is proposed to identify the differentially expressed genes. In our method, the Schatten p-norm is used as the regularization function to obtain a low-rank matrix and the Lp-norm is taken as the error function to improve the robustness to outliers in the gene expression data. The results on simulation data show that our method can obtain higher identification accuracies than the competitive methods. Numerous experiments on real gene expression data sets demonstrate that our method can identify more differentially expressed genes than the others. Moreover, we confirmed that the identified genes are closely correlated with the corresponding gene expression data.Entities:
Mesh:
Year: 2015 PMID: 26201006 PMCID: PMC4511795 DOI: 10.1371/journal.pone.0133124
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The PRFE model of gene expression data used for gene identification.
Fig 2Identification accuracies of the five methods on simulation data with different parameters, where p is taken as the parameter in the case of PRFE p = 1 to test the performance of different p values; p is taken as the parameter in the case of PRFE p = 1 to test the performance of different p values; α 1, α 2 and γ are the control-sparsity parameters of PMD, CIPMD and SPCA, respectively.
Fig 3Identification accuracies of the five methods on simulation data with different samples.
Fig 4ROC curve for simulation data.
AUC statistics for simulation data.
| Methods | SPCA | PMD | CIPMD | SVM-RFE | PRFE |
|---|---|---|---|---|---|
|
| 0.909 | 0.911 | 0.959 | 0.933 | 0.990 |
The sample number of each stress type in the raw data.
| Stress Type | control | cold | drought | heat | osmotic | salt | UV-B |
|---|---|---|---|---|---|---|---|
| Sample Number | 8 | 6 | 7 | 8 | 6 | 6 | 7 |
Response to stress (GO:0006950).
In this table, the response to stress on differentially expressed genes is shown, whose background frequency in TAIR is 4044/30322 (13.3%), where 4044/30322 represents having 4044 genes response to stimulus in whole 30322 genes. SF and PV represent the sample frequency and P-value, respectively. The sample frequency, e.g. 223, represents the method identifies 500 genes, in which there are 223 genes responding to stress. Root and shoot denote the root samples and shoot samples, respectively.
| Stress Type | SPCA | PMD | CIPMD | SVM-RFE | PRFE | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| SF | PV | SF | PV | SF | PV | SF | PV | SF | PV | ||
| Cold | root | 223 | 1.66E-64 | 233 | 9.92E-72 | 264 | 7.64E-98 | 218 | 7.14E-61 | 245 | 6.05E-81 |
| 44.8% | 46.6% | 52.9% | 43.9% | 49.0% | |||||||
| Cold | shoot | 219 | 1.47E-61 | 213 | 4.44E-57 | 243 | 6.84E-80 | 204 | 1.07E-50 | 221 | 6.36E-63 |
| 44.0% | 42.7% | 48.7% | 40.9% | 44.5% | |||||||
| Drought | root | 231 | 3.60E-70 | 222 | 2.27E-63 | 279 | 2.50E-111 | 225 | 7.69E-66 | 232 | 1.36E-70 |
| 46.2% | 44.4% | 55.8% | 45.2% | 46.4% | |||||||
| Drought | shoot | 198 | 5.05E-47 | 246 | 2.47E-82 | 255 | 5.89E-90 | 201 | 1.02E-48 | 277 | 5.61E-109 |
| 39.8% | 49.3% | 51.1% | 40.3% | 55.4% | |||||||
| Heat | root | 152 | 5.73E-21 | 169 | 1.39E-29 | 277 | 1.03E-109 | 242 | 1.11E-78 | 180 | 8.81E-36 |
| 30.5% | 33.9% | 55.5% | 48.4% | 36.2% | |||||||
| Heat | shoot | 187 | 4.49E-40 | 174 | 3.51E-32 | 264 | 1.51E-97 | 225 | 1.21E-65 | 213 | 6.55E-57 |
| 37.6% | 34.8% | 52.8% | 45.1% | 42.8% | |||||||
| Osmotic | root | 172 | 4.39E-31 | 160 | 8.07E-25 | 234 | 1.78E-72 | 227 | 6.15E-67 | 176 | 4.04E-33 |
| 34.4% | 32.0% | 46.8% | 45.4% | 35.2% | |||||||
| Osmotic | shoot | 192 | 4.96E-43 | 227 | 4.12E-67 | 246 | 2.30E-82 | 183 | 2.88E-37 | 226 | 5.21E-66 |
| 38.5% | 45.4% | 49.3% | 36.6% | 45.2% | |||||||
| Salt | root | 178 | 1.79E-34 | 246 | 3.88E-82 | 232 | 5.58E-71 | 218 | 1.76E-60 | 243 | 2.57E-79 |
| 35.6% | 49.2% | 46.4% | 43.7% | 48.6% | |||||||
| Salt | shoot | 169 | 1.85E-29 | 176 | 1.34E-33 | 236 | 2.90E-74 | 202 | 3.32E-49 | 181 | 2.16E-36 |
| 33.8% | 35.3% | 47.3% | 40.4% | 36.4% | |||||||
| UV-B | root | 153 | 2.26E-21 | 165 | 2.34E-27 | 262 | 9.89E-96 | 222 | 2.04E-63 | 178 | 2.35E-34 |
| 30.6% | 33.0% | 52.4% | 44.5% | 35.7% | |||||||
| UV-B | shoot | 249 | 4.18E-85 | 295 | 3.30E-127 | 277 | 1.06E-109 | 186 | 4.81E-39 | 300 | 4.38E-132 |
| 50.0% | 59.1% | 55.5% | 37.2% | 60.2% | |||||||
Response to abiotic stimulus (GO:0009628).
In this table, the response to abiotic stimulus on differentially expressed genes is shown, whose background frequency in TAIR is 2842/30322 (9.4%), where 2842/30322 represents having 2842 genes response to stimulus in whole 30322 genes. SF and PV represent the sample frequency and P-value, respectively. The sample frequency can reflect the identify accuracy of the diffenrent methods, e.g. 155, represents the method identifies 500 genes, in which there are 155 genes responding to abiotic stimulus. Root and shoot denote the root samples and shoot samples, respectively.
| Stress Type | SPCA | PMD | CIPMD | SVM-RFE | PRFE | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| SF | PV | SF | PV | SF | PV | SF | PV | SF | PV | ||
| Cold | root | 155 | 3.31E-40 | 168 | 6.34E-49 | 172 | 8.99E-52 | 180 | 4.29E-58 | 178 | 5.02E-56 |
| 31.1% | 33.6% | 34.4% | 36.2% | 35.6% | |||||||
| Cold | shoot | 148 | 1.13E-35 | 179 | 3.57E-57 | 180 | 6.31E-58 | 178 | 2.93E-56 | 184 | 4.24E-61 |
| 29.7% | 35.9% | 36.1% | 35.7% | 37.0% | |||||||
| Drought | root | 134 | 4.66E-27 | 118 | 1.51E-18 | 170 | 2.28E-50 | 185 | 8.52E-62 | 136 | 4.85E-28 |
| 26.8% | 23.6% | 34.0% | 37.1% | 27.2% | |||||||
| Drought | shoot | 126 | 8.27E-23 | 164 | 3.49E-46 | 177 | 1.21E-55 | 177 | 1.58E-55 | 183 | 8.05E-60 |
| 25.3% | 32.9% | 35.5% | 35.5% | 36.6% | |||||||
| Heat | root | 108 | 6.69E-14 | 141 | 3.11E-31 | 173 | 1.13E-52 | 198 | 5.99E-72 | 148 | 1.37E-35 |
| 21.6% | 28.3% | 34.7% | 39.6% | 29.8% | |||||||
| Heat | shoot | 142 | 6.07E-32 | 148 | 2.04E-35 | 173 | 1.64E-52 | 192 | 3.28E-67 | 169 | 1.18E-49 |
| 28.5% | 29.6% | 34.6% | 38.5% | 33.9% | |||||||
| Osmotic | root | 132 | 6.69E-26 | 120 | 1.42E-19 | 165 | 4.88E-47 | 193 | 7.66E-68 | 136 | 4.76E-28 |
| 26.4% | 24.0% | 33.1% | 38.6% | 27.2% | |||||||
| Osmotic | shoot | 146 | 2.65E-34 | 171 | 4.55E-51 | 166 | 1.28E-47 | 186 | 2.77E-62 | 176 | 1.67E-54 |
| 29.3% | 34.2% | 33.3% | 37.2% | 35.2% | |||||||
| Salt | root | 119 | 4.82E-19 | 152 | 5.13E-38 | 161 | 5.65E-44 | 183 | 4.41E-60 | 114 | 1.00E-39 |
| 23.8% | 30.4% | 32.2% | 36.7% | 22.8% | |||||||
| Salt | shoot | 145 | 1.45E-33 | 148 | 1.12E-35 | 179 | 5.52E-57 | 183 | 6.12E-60 | 153 | 7.9E-39 |
| 29.0% | 29.7% | 35.8% | 36.6% | 30.8% | |||||||
| UV-B | root | 101 | 6.70E-11 | 120 | 1.49E-19 | 176 | 7.04E-55 | 184 | 7.27E-61 | 135 | 1.53E-27 |
| 20.2% | 24.0% | 35.3% | 36.9% | 27.1% | |||||||
| UV-B | shoot | 154 | 1.49E-39 | 153 | 8.81E-39 | 184 | 5.20E-61 | 179 | 7.26E-57 | 171 | 4.3E-51 |
| 30.9% | 30.7% | 36.9% | 35.8% | 34.3% | |||||||
The terms of genes identified by different methods.
In this table, 'Term in Genome' denotes the number of genes associated with the term in global genome; 'Input' denotes the number of genes associated with the term from input.
| Rank | Name | SPCA | PMD | CIPMD | SVM-RFE | PRFE | Term in Genome |
|---|---|---|---|---|---|---|---|
| Input PV | Input PV | Input PV | Input PV | Input PV | |||
| 1 | immune response | 29 | 27 | 27 | 36 | 33 | 1416 |
| 5.39E-14 | 2.04E-12 | 2.92E-12 | 4.10E-20 | 3.31E-18 | |||
| 2 | defense response | 30 | 26 | 24 | 34 | 30 | 1515 |
| 4.02E-14 | 6.40E-11 | 3.04E-9 | 3.43E-17 | 1.69E-14 | |||
| 3 | response to biotic stimulus | 19 | 15 | 15 | 24 | 22 | 760 |
| 1.22E-10 | 2.69E-7 | 3.24E-7 | 3.70E-15 | 8.46E-14 | |||
| 4 | response to other organism | 19 | 14 | 15 | 24 | 21 | 726 |
| 5.60E-11 | 9.42E-7 | 1.80E-7 | 1.34E-15 | 3.58E-13 | |||
| 5 | response to external biotic stimulus | 19 | 14 | 15 | 24 | 21 | 726 |
| 5.60E-11 | 9.42E-7 | 1.80E-7 | 1.34E-15 | 3.58E-13 | |||
| 6 | response to reactive oxygen species | None | 8 | None | None | 11 | 170 |
| None | 3.91E-7 | None | None- | 6.26E-11 | |||
| 7 | regulation of immune system process | 23 | 14 | 14 | 25 | 23 | 1212 |
| 2.19E-10 | 9.56E-7 | 3.53E-6 | 1.21E-11 | 1.19E-10 | |||
| 8 | leukocyte activation | 18 | 17 | 17 | 22 | 18 | 695 |
| 2.33E-10 | 1.53E-9 | 1.91E-9 | 6.36E-14 | 1.44E-10 | |||
| 9 | hematopoietic or lymphoid organ development | 18 | 14 | None | 19 | 19 | 795 |
| 2.00E-9 | 2.74E-6 | None | 5.41E-10 | 1.57E-10 | |||
| 10 | cell activation | 22 | 19 | 19 | 26 | 20 | 916 |
| 6.47E-12 | 2.16E-9 | 2.76E-9 | 2.48E-15 | 2.31E-10 |
Fig 5Venn diagram of five methods on leukemia data.
The detailed information of the 5 'unique' genes identified by PRFE.
| NO. | Affymetrix ID | Gene Symbol | Function of Genes |
|---|---|---|---|
| 1 | S53911_at | CD34 | The protein encoded by this gene may play a role in the attachment of stem cells to the bone marrow extracellular matrix or to stromal cells. |
| 2 | AFFX-M27830_5_at | GB virus C effect on hepatitis C virus (HCV)/human immunodeficiency virus (HIV) co-infected patients: liver. | |
| 3 | M21624_at | TRAJ17 | T cell receptor alpha joining 17. |
| 4 | X60486_at | HIST1H4C | Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. |
| 5 | M57466_s_at | HLA-DPB1 | HLA-DPB belongs to the HLA class II beta chain paralogues. This class II molecule is a heterodimer consisting of an alpha (DPA) and a beta chain (DPB), both anchored in the membrane. It plays a central role in the immune system by presenting peptides derived from extracellular proteins. |
The detailed information of the 30 genes identified by PRFE.
| NO. | Affymetrix ID | Gene Symbol | Function of Genes |
|---|---|---|---|
| 1 | M25079_s_at | HBD | The delta (HBD) and beta (HBB) genes are normally expressed in the adult: two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin. |
| 2 | X57351_s_at | IFITM2 | Interferon induced transmembrane protein 2. |
| 3 | X00274_at | HLA-DRA | HLA-DRA is one of the HLA class II alpha chain paralogues. This class II molecule is a heterodimer consisting of an alpha and a beta chain, both anchored in the membrane. It plays a central role in the immune system by presenting peptides derived from extracellular proteins. |
| 4 | Z84721_cds2_at | HBA2 | The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5'- zeta-pseudozeta-mu-pseudoalpha-1-alpha-2 (HBA2)- alpha-1-theta-3'. |
| 5 | X00437_s_at | TRBC1 | T cell receptor beta constant 1. |
| 6 | D64142_at | H1FX | H1 histone family, member X. Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. |
| 7 | M11147_at | FTL | This gene encodes the light subunit of the ferritin protein. Ferritin is the major intracellular iron storage protein in prokaryotes and eukaryotes. |
| 8 | M13560_s_at | CD74 | The protein encoded by this gene associates with class II major histocompatibility complex (MHC) and is an important chaperone that regulates antigen presentation for immune response. It also serves as cell surface receptor for the cytokine macrophage migration inhibitory factor (MIF) which, when bound to the encoded protein, initiates survival pathways and cell proliferation. |
| 9 | Y00433_at | GPX1 | This gene encodes a member of the glutathione peroxidase family. Glutathione peroxidase functions in the detoxification of hydrogen peroxide, and is one of the most important antioxidant enzymes in humans. |
| 10 | V00594_s_at | MT2A | Metallothionein 2A. |
| 11 | L19779_at | HIST2H2AA4 | Histone cluster 2, H2aa4. Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. |
| 12 | AFFX-HUMRGE/M10098_5_at | SRP68 | This gene encodes a subunit of the signal recognition particle (SRP). The SRP is a ribonucleoprotein complex that transports secreted and membrane proteins to the endoplasmic reticulum for processing. |
| 13 | AFFX-HUMRGE/M10098_3_at | SRP68 | This gene encodes a subunit of the signal recognition particle (SRP). The SRP is a ribonucleoprotein complex that transports secreted and membrane proteins to the endoplasmic reticulum for processing. |
| 14 | M91036_rna1_at | HBG2 | The gamma globin genes (HBG1 and HBG2) are normally expressed in the fetal liver, spleen and bone marrow. |
| 15 | M12886_at | IL23A | This gene encodes a subunit of the heterodimeric cytokine interleukin 23 (IL23). IL23 is composed of this protein and the p40 subunit of interleukin 12 (IL12B). |
| 16 | X82240_rna1_at | TCL1A | Overexpression of the TCL1 gene in humans has been implicated in the development of mature T cell leukemia. |
| 17 | M16279_at | CD99 | The protein encoded by this gene is a cell surface glycoprotein involved in leukocyte migration, T-cell adhesion, ganglioside GM1 and transmembrane protein transport, and T-cell death by a caspase-independent pathway. |
| 18 | M13792_at | ADA | This gene encodes an enzyme that catalyzes the hydrolysis of adenosine to inosine. Various mutations have been described for this gene and have been linked to human diseases. |
| 19 | M33600_f_at | HLA-DRB1 | HLA-DRB1 belongs to the HLA class II beta chain paralogs. The class II molecule is a heterodimer consisting of an alpha (DRA) and a beta chain (DRB), both anchored in the membrane. It plays a central role in the immune system by presenting peptides derived from extracellular proteins. |
| 20 | M21186_at | CYBA | Cytochrome b is comprised of a light chain (alpha) and a heavy chain (beta). This gene encodes the light, alpha subunit which has been proposed as a primary component of the microbicidal oxidase system of phagocytes. |
| 21 | L06797_s_at | CXCR4 | This gene encodes a CXC chemokine receptor specific for stromal cell-derived factor-1. The protein has 7 transmembrane regions and is located on the cell surface. |
| 22 | X68277_at | DUSP1 | The expression of DUSP1 gene is induced in human skin fibroblasts by oxidative/heat stress and growth factors. It specifies a protein with structural features similar to members of the non-receptor-type protein-tyrosine phosphatase family, and which has significant amino-acid sequence similarity to a Tyr/Ser-protein phosphatase encoded by the late gene H1 of vaccinia virus. |
| 23 | M69043_at | NFKBIA | This gene encodes a member of the NF-kappa-B inhibitor family, which contain multiple ankrin repeat domains. The encoded protein interacts with REL dimers to inhibit NF-kappa-B/REL complexes which are involved in inflammatory responses. |
| 24 | X58529_at | IGHM | Immunoglobulin heavy constant mu. Immunoglobulins (Ig) are the antigen recognition molecules of B cells. |
| 25 | J04456_at | LGALS1 | The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. This gene product may act as an autocrine negative growth factor that regulates cell proliferation. |
| 26 | X78992_at | ZFP36L2 | This gene is a member of the TIS11 family of early response genes. Family members are induced by various agonists such as the phorbol ester TPA and the polypeptide mitogen EGF. |
| 27 | X12671_rna1_at | HNRNPA1 | The protein encoded by this gene has two repeats of quasi-RRM domains that bind to RNAs. It is one of the most abundant core proteins of hnRNP complexes and it is localized to the nucleoplasm. |
| 28 | M33680_at | CD81 | The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. |
| 29 | Y00787_s_at | IL8 | Gene expression profiling study of contribution of GM-CSF and IL-8 to the CD44-induced differentiation of acute monoblastic leukemia. |
| 30 | S73591_at | TXNIP | Thioredoxin interacting protein. |