| Literature DB >> 24579087 |
Tzu-Hao Chang1, Shih-Lin Wu2, Wei-Jen Wang3, Jorng-Tzong Horng4, Cheng-Wei Chang5.
Abstract
Microarrays are widely used to assess gene expressions. Most microarray studies focus primarily on identifying differential gene expressions between conditions (e.g., cancer versus normal cells), for discovering the major factors that cause diseases. Because previous studies have not identified the correlations of differential gene expression between conditions, crucial but abnormal regulations that cause diseases might have been disregarded. This paper proposes an approach for discovering the condition-specific correlations of gene expressions within biological pathways. Because analyzing gene expression correlations is time consuming, an Apache Hadoop cloud computing platform was implemented. Three microarray data sets of breast cancer were collected from the Gene Expression Omnibus, and pathway information from the Kyoto Encyclopedia of Genes and Genomes was applied for discovering meaningful biological correlations. The results showed that adopting the Hadoop platform considerably decreased the computation time. Several correlations of differential gene expressions were discovered between the relapse and nonrelapse breast cancer samples, and most of them were involved in cancer regulation and cancer-related pathways. The results showed that breast cancer recurrence might be highly associated with the abnormal regulations of these gene pairs, rather than with their individual expression levels. The proposed method was computationally efficient and reliable, and stable results were obtained when different data sets were used. The proposed method is effective in identifying meaningful biological regulation patterns between conditions.Entities:
Mesh:
Year: 2014 PMID: 24579087 PMCID: PMC3919110 DOI: 10.1155/2014/763237
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Statistical data of used data set.
| GEO no. | Sample no. (nonrelapse/relapse) | Platform | Probe no. | Description | Reference no. |
|---|---|---|---|---|---|
| GSE2034 | 288 (179/109) | Affymetrix U133a (GPL96) | 22,283 | Breast cancer | [ |
| GSE1456 | 159 (119/40) | Affymetrix U133a (GPL96) | 22,283 | Breast cancer | [ |
| GSE4922 | 249 (160/89) | Affymetrix U133a (GPL96) | 22,283 | Breast cancer | [ |
| GSE2109 | 2,158 | Affymetrix U133a Plus 2.0 (GPL570) | 54,675 | Different types of cancer | [ |
Figure 1Overall flow of the proposed approach.
Figure 2The process flow for identifying differential correlation of gene expression.
Figure 3The execution times of linear regression validation and correlation computation by using the Hadoop MapReduce implementation and the sequential Java implementation, given different numbers of genes.
Correlations of gene expressions between nonrelapse and relapse samples in three data sets mapped in the KEGG pathway.
|
Condition |
Number of correlated gene | Number of differential correlations of gene pairs (AVG ± 3∗SD) | ||
|---|---|---|---|---|
| Positive (+) Cor. > 0.45 | Negative (−) Cor. < −0.45 | |||
| GSE2034 | Nonrelapse (179) | 606 | 35 | 33 |
| relapse (107) | 473 | 21 | ||
| GSE1456 | Nonrelapse (119) | 491 | 21 | 32 |
| relapse (40) | 747 | 133 | ||
| GSE4922 | Nonrelapse (160) | 575 | 40 | 50 |
| relapse (89) | 677 | 84 | ||
Pathways containing different correlated genes between relapse and nonrelapse breast cancer patients.
| ID | Pathway names | Differential correlation genes (cor_Dif*) |
|---|---|---|
| hsa05200 | Pathways in cancer | NFKB2 → PTGS2 (+0.37) |
| JUN → MMP1 (+0.29) | ||
| RUNX1→ CEBPA (+0.29) | ||
| JUN → FIGF (−0.27) | ||
|
| ||
| hsa04151 | PI3K-Akt signaling pathway | FGF4 → EGFR (−0.32) |
| IRS1 → PIK3CB (−0.3) | ||
| CSF1 → KIT (+0.3) | ||
| EFNA1 → IGF1R (−0.28) | ||
|
| ||
| hsa04722 | Neurotrophin signaling pathway | IRS1 → PIK3CG (+0.34) |
| RPS6KA2 → NRAS (+0.3) | ||
| IRS1 → PIK3CB (−0.3) | ||
|
| ||
| hsa04062 | Chemokine signaling pathway | GNB5 → PIK3R5 (+0.31) |
| CCL18 → XCR1 (−0.29) | ||
|
| ||
| hsa04150, | mTOR signaling pathway | |
| hsa04910, | Insulin signaling pathway | IRS1 → PIK3CG (+0.34) |
| hsa04930, | Type II diabetes mellitus | IRS1 → PIK3CB (−0.3) |
| hsa04960 | Aldosterone-regulated sodium reabsorption | |
|
| ||
| hsa04010 | MAPK signaling pathway | DUSP2 → MAPK8 (−0.34) |
|
| ||
| hsa04060 | Cytokine-cytokine receptor interaction | IL17A → IL17RA (−0.32) |
|
| ||
| hsa04310 | Wnt signaling pathway | SFRP5 → WNT11 (+0.3) |
|
| ||
| hsa04520 | Adherens junction | WAS → ACTB (−0.39) |
|
| ||
| hsa04530 | Tight junction | PRKCQ → ACTB (−0.34) |
|
| ||
| hsa04612 | Antigen processing and presentation | HLA-E → KIR2DS1 (−0.29) |
|
| ||
| hsa04620 | Toll-like receptor signaling pathway | RIPK1 → TRAF6 (+0.31) |
|
| ||
| hsa04630 | Jak-STAT signaling pathway | STAT6 → SOCS1 (+0.31) |
|
| ||
| hsa04666 | Fc gamma R-mediated phagocytosis | WASF3 → ARPC2 (+0.3) |
|
| ||
| hsa04725 | Cholinergic synapse | GNB5 → PIK3R5 (+0.31) |
|
| ||
| hsa05012 | Parkinson's disease | SLC25A4 → CYCS (+0.28) |
|
| ||
| hsa05020 | Prion diseases | PRNP → BAX (+0.29) |
|
| ||
| hsa05152 | Tuberculosis | MAPK3 → IL23A (+0.31) |
|
| ||
| hsa05211 | Renal cell carcinoma | EGLN3 → EPAS1 (+0.28) |
*cor_Dif: average correlation of genes in nonrelapse samples − average correlation of genes in relapse samples.