| Literature DB >> 19458772 |
Pingzhao Hu1, Celia M T Greenwood, Joseph Beyene.
Abstract
BACKGROUND: Microarray technology has been previously used to identify genes that are differentially expressed between tumour and normal samples in a single study, as well as in syntheses involving multiple studies. When integrating results from several Affymetrix microarray datasets, previous studies summarized probeset-level data, which may potentially lead to a loss of information available at the probe-level. In this paper, we present an approach for integrating results across studies while taking probe-level data into account. Additionally, we follow a new direction in the analysis of microarray expression data, namely to focus on the variation of expression phenotypes in predefined gene sets, such as pathways. This targeted approach can be helpful for revealing information that is not easily visible from the changes in the individual genes.Entities:
Keywords: data synthesis; meta-analysis; pathway enrichment analysis; probe-level test; prostate cancer; random effect models
Year: 2007 PMID: 19458772 PMCID: PMC2675508
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Training and test data sets.
| Data Set | Platform | Number of Probe Sets/Spots | Number of Normal Samples | Number of Cancer Samples | Reference | Source of Raw Data |
|---|---|---|---|---|---|---|
| Training Sets | Affymetrix (HG_U95Av2) | 12600 | 25 | 26 | Supplement | |
| Affymetrix (HG_U95Av2) | 12626 | 8 | 25 | Author | ||
| Affymetrix (HG_U95Av2) | 12626 | 3 | 23 | GEDP | ||
| Testing Sets | Affymetrix (HG_U95Av2) | 12600 | 25 | 26 | Supplement | |
| Affymetrix (HG_U95Av2) | 12625 | 50 | 38 | GEO |
The Singh data set was randomly divided into a training set (51 arrays) and a testing set (51 arrays)
The numbers of normal and cancer samples shown in original papers are 9 and 24 respectively. The author suggested that we treat the data as 8 normal samples and 25 cancer samples when they sent us their raw data (CEL files)
The Gene Expression Data Portal (GEDP), National Cancer Institute
GEO: Gene Expression Omnibus groups: treatment (t) and control (c) groups in a study.
Figure 1Number of differentially expressed genes as a function of false discovery rate (FDR) thresholds.
The top 20 significantly enriched pathways for the set of significantly differentially-expressed genes identified using PLM (Supplementary Table 1), together with their predictive accuracies in the test datasets.
| Pathway ID | p-value | # of Genes | Accuracy of Singh testing data | Accuracy of Stuart testing data | Pathway Name |
|---|---|---|---|---|---|
| 04810 | 0 | 33 | 0.824 | 0.761 | Regulation of actin cytoskeleton |
| 04910 | 0 | 28 | 0.922 | 0.795 | Insulin signaling pathway |
| 00230 | 0 | 29 | 0.686 | 0.591 | Purine metabolism |
| 04010 | 0 | 46 | 0.745 | 0.818 | MAPK signaling pathway |
| 04020 | 0 | 29 | 0.824 | 0.568 | Calcium signaling pathway |
| 04510 | 0 | 35 | 0.804 | 0.693 | Focal adhesion |
| 00190 | 5.55E-16 | 23 | 0.804 | 0.67 | Oxidative phosphorylation |
| 04514 | 8.88E-16 | 23 | 0.843 | 0.648 | Cell adhesion molecules (CAMs) |
| 00240 | 1.78E-14 | 21 | 0.745 | 0.557 | Pyrimidine metabolism |
| 04070 | 3.44E-14 | 20 | 0.765 | 0.705 | Phosphatidylinositol signaling system |
| 01430 | 1.25E-13 | 19 | 0.843 | 0.739 | Cell Communication |
| 04060 | 5.75E-13 | 18 | 0.765 | 0.682 | Cytokine-cytokine receptor interaction |
| 04530 | 2.93E-12 | 17 | 0.784 | 0.602 | Tight junction |
| 00330 | 4.09E-12 | 17 | 0.882 | 0.739 | Arginine and proline metabolism |
| 04310 | 1.01E-11 | 16 | 0.725 | 0.682 | Wnt signaling pathway |
| 00480 | 2.40E-11 | 16 | 0.843 | 0.67 | Glutathione metabolism |
| 04540 | 4.48E-11 | 15 | 0.686 | 0.818 | Gap junction |
| 04720 | 4.61E-11 | 15 | 0.765 | 0.773 | Long-term potentiation |
| 04670 | 4.96E-11 | 15 | 0.824 | 0.705 | Leukocyte transendothelial migration |
| 04512 | 5.93E-11 | 15 | 0.745 | 0.795 | ECM-receptor interaction |
The number of genes used in building models for prostate cancer prediction.
Figure 2Predictive accuracy of the SVM models, as a function of the number of differentially expressed genes used for prediction: (a) Singh testing data; (b) Stuart testing data.
The top 20 significantly enriched pathways found for the set of significantly differentially-expressed genes identified using PSLM (Supplementary Table 2), together with the predictive accuracies in the test datasets.
| Pathway ID | p-value | # of Genes | Accuracy in the Singh testing data | Accuracy in the Stuart testing data | Pathway Name |
|---|---|---|---|---|---|
| 03010 | 0 | 32 | 0.784 | 0.693 | Ribosome |
| 04010 | 3.33E-16 | 26 | 0.784 | 0.455 | MAPK signaling pathway |
| 04810 | 5.66E-14 | 22 | 0.745 | 0.602 | Regulation of actin cytoskeleton |
| 00230 | 6.13E-14 | 21 | 0.686 | 0.591 | Purine metabolism |
| 04910 | 1.41E-13 | 21 | 0.804 | 0.739 | Insulin signaling pathway |
| 04514 | 3.08E-13 | 20 | 0.824 | 0.614 | Cell adhesion molecules (CAMs) |
| 04020 | 4.49E-13 | 20 | 0.784 | 0.602 | Calcium signaling pathway |
| 04510 | 1.22E-12 | 19 | 0.765 | 0.58 | Focal adhesion |
| 00190 | 1.31E-10 | 16 | 0.843 | 0.693 | Oxidative phosphorylation |
| 04664 | 1.17E-09 | 14 | 0.686 | 0.5 | Fc epsilon RI signaling pathway |
| 04540 | 1.17E-09 | 13 | 0.706 | 0.5 | Gap junction |
| 04060 | 3.16E-09 | 13 | 0.725 | 0.591 | Cytokine-cytokine receptor interaction |
| 00240 | 3.94E-09 | 13 | 0.843 | 0.705 | Pyrimidine metabolism |
| 00480 | 4.95E-09 | 13 | 0.784 | 0.591 | Glutathione metabolism |
| 04520 | 5.33E-09 | 13 | 0.784 | 0.466 | Adherens junction |
| 04070 | 4.75E-08 | 11 | 0.686 | 0.739 | Phosphatidylinositol signaling system |
| 04080 | 7.64E-08 | 11 | 0.706 | 0.614 | Neuroactive ligand-receptor interaction |
| 04670 | 7.75E-08 | 11 | 0.784 | 0.602 | Leukocyte transendothelial migration |
| 04512 | 1.33E-07 | 10 | 0.765 | 0.693 | ECM-receptor interaction |
| 04360 | 1.58E-07 | 10 | 0.745 | 0.523 | Axon guidance |