| Literature DB >> 15217521 |
Hongying Jiang1, Youping Deng, Huann-Sheng Chen, Lin Tao, Qiuying Sha, Jun Chen, Chung-Jui Tsai, Shuanglin Zhang.
Abstract
BACKGROUND: Due to the high cost and low reproducibility of many microarray experiments, it is not surprising to find a limited number of patient samples in each study, and very few common identified marker genes among different studies involving patients with the same disease. Therefore, it is of great interest and challenge to merge data sets from multiple studies to increase the sample size, which may in turn increase the power of statistical inferences. In this study, we combined two lung cancer studies using microarray GeneChip, employed two gene shaving methods and a two-step survival test to identify genes with expression patterns that can distinguish diseased from normal samples, and to indicate patient survival, respectively.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15217521 PMCID: PMC476733 DOI: 10.1186/1471-2105-5-81
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Hierarchical Clustering Analysis of two data sets. A: raw data without any normalization, B: partial normalized data without distribution transformation, and C: partial normalized data with distribution transformation. AD and NL refer to adenocarcinomas (AD) patients and normal (NL) samples, respectively.
Figure 2Scatter plot comparing data distributions with and without distribution transformation (disTran).
Figure 3Quantile-Quantile (Q-Q) plot comparing data distributions with and without distribution transformation (disTran).
The out-of-bag and prediction errors1 of the last 15 nested probe sets2 using GSRF.
| No. of probe sets | Out-of-bag error | Prediction error | Probe set ID (HG_U95Av2) | |||
| D13 | D23 | C3 | D1→D23 | D2→D13 | ||
| 1 | 1 | 1 | 2 | 1 | 2 | 268_at |
| 2 | 1 | 1 | 1 | 0 | 1 | + 35868_at |
| 3 | 1 | 0 | 1 | 0 | 1 | + 1596_g_at |
| 4 | 1 | 0 | 1 | 0 | 0 | + 38430_at |
| 5 | 0 | 0 | 1 | 0 | 0 | + 32542_at |
| 6 | 0 | 0 | 0 | 0 | 0 | + 40282_s_at |
| 7–12 | 0 | 0 | 0 | 0 | 0 | +198_at, 32184_at, 39031_at, 36627_at, 1815_g_at, 39631_at |
| 13 | 0 | 1 | 0 | 0 | 0 | + 40419_at |
| 15 | 0 | 0 | 0 | 0 | 0 | +1814_at, 39350_at |
1: The number of misclassified samples. 2: Sets of genes (probe sets) produced by iteratively removing 10% of the least significant genes at a time. 3: D1: Data set 1; D2: Data set 2; C: Combined data set; D1→D2: use D1 to predict D2; D2→D1: use D2 to predict D1. +: Additional probe set ID plus those identified from lower number probe-set list(s). For example, the 3 probe sets contains 1596_g_at in addition to those (35868_at and 268_at) identified from the 2 probe sets prediction.
The leave-one-out cross-validation and prediction errors1 of the last 10 nested probe sets2 using GSFLD.
| No. of probe sets | Leave-one-out cross-validation error | Predict error | Probe set ID (HG_U95Av2) | |||
| D13 | D23 | C3 | D1→D23 | D2→D13 | ||
| 1 | 2 | 1 | 4 | 1 | 2 | 35868_at |
| 2 | 0 | 0 | 2 | 0 | 1 | + 38430_at |
| 3 | 0 | 0 | 1 | 0 | 1 | + 36247_f_at |
| 4 | 0 | 0 | 1 | 1 | 1 | + 32542_at |
| 5 | 0 | 0 | 1 | 0 | 1 | + 39220_at |
| 6 | 1 | 0 | 1 | 1 | 1 | + 38299_at |
| 7 | 1 | 0 | 1 | 1 | 1 | + 1596_g_at |
| 8 | 1 | 0 | 1 | 1 | 1 | + 39350_at |
| 9 | 1 | 0 | 1 | 1 | 1 | + 31525_s_at |
| 10 | 0 | 0 | 0 | 1 | 1 | + 37407_s_at |
1: The number of misclassified samples. 2: Sets of genes (probe sets) produced by iteratively removing 10% of the least significant genes at a time. 3: D1: Data set 1; D2: Data set 2; C: Combined data set; D1→D2: use D1 to predict D2; D2→D1: use D2 to predict D1. +: Additional probe set ID plus those identified from lower number probe-set list(s). For example, the 3 probe sets contains 36247_f_at in addition to those (38430_at and 35868_at) identified from the 2 probe sets prediction.
Figure 4Comparison between the original Kaplan-Meier survival curves and the predicted survival curves using the selected 16 genes. Data set 1 has 72 patients with tumor stages 1 and 3, while data set 2 has 83 patients with tumor stages 1, 2, and 3.