| Literature DB >> 18837994 |
Tongtong Wu1, Wei Sun, Shinsheng Yuan, Chun-Houh Chen, Ker-Chau Li.
Abstract
BACKGROUND: Survival time is an important clinical trait for many disease studies. Previous works have shown certain relationship between patients' gene expression profiles and survival time. However, due to the censoring effects of survival time and the high dimensionality of gene expression data, effective and unbiased selection of a gene expression signature to predict survival probabilities requires further study.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18837994 PMCID: PMC2579309 DOI: 10.1186/1471-2105-9-417
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Average numbers of selected and true predictors found in simulation
| 0.8 | 3.56 | 2 | 10 | 2 | 7.22 |
| 0.4 | 2.54 | 2 | 10 | 2 | 2.6 |
This table reports the average numbers of genes selected from LA (n1) and correlation (n2), the average numbers of correct genes selected from LA (K) and correlation (K), and the average numbers of predictors Kfound in the clusters 31 – 34 and 41 – 44 for ρ = 0.8 and 0.4. The sample size is (n, p) = (500,10000).
Quantiles of the p-values (in log10 scale) of the log rank test for testing data
| 0% | 25% | 50% | 75% | 100% | |
| 0.8 | -48.6007 | -41.2775 | -36.3042 | -28.5859 | -22.0066 |
| 0.4 | -52.4528 | -45.4521 | -38.5598 | -33.7795 | -22.1615 |
Coefficients in the censorSIR projection direction in simulation
| Cluster | Other | |||||
| 0.8 | 0.2441 | 0.2299 | 0.6222 | 0.5825 | 0.1321 | 0.1133 |
| 0.4 | 0.2200 | 0.2331 | 0.5347 | 0.5240 | 0.0620 | 0.1591 |
This table reports the average absolute values of the coefficients for genes 1–4, genes in the clusters (31–34) and (41–44), and noise genes in censorSIR directions.
Twenty-two genes selected by LA and correlations
| ABCG1 (3) | ATP-binding cassette, sub-family G (WHITE), member 1 | ATP binding; cholesterol homeostasis |
| BIRC5 (-0.31) | baculoviral IAP repeat-containing 5 (survivin) | Colorectal cancer; apoptosis |
| C5orf30 (3) | chromosome 5 open reading frame 30 | |
| CENPA (-0.32) | centromere protein A | chromosome organization and biogenesis |
| CTSL2 (-0.33) | cathepsin L2 | cathepsin L activity; proteolysis |
| E2F7 (-0.31) | E2F transcription factor 7 | breast cancer cell growth [ |
| ERBB2 (3) | v-erb-b2 erythroblastic leukemia viral oncogene homolog 2 | member of the epidermal growth factor (EGF) receptor family of receptor tyrosine kinases; Amplification and/or overexpression in numerous cancers, including breast and ovarian tumors |
| FAM150B (3) | family with sequence similarity 150, member B | |
| H06509 (3) | mRNA sequence | |
| HJURP (-0.31) | Holliday junction recognition protein | up-regulated in lung cancer |
| KIF20A (-0.30) | kinesin family member 20A | Collaboration of KIF20A and disc large homologue 5 is likely to be involved in pancre atic cancer [ |
| KIFC1 (-0.30) | kinesin family member C1 | mitotic sister chromatid segregation |
| KRT6B (4) | keratin 6B | Cell Communication; ectoderm development |
| LOC284072 (3) | hypothetical protein | |
| ORMDL2 (4) | ORM1 (S. cerevisiae)-like 2 | expressed in normal aorta |
| PDGFRA (4) | platelet-derived growth factor receptor, alpha polypeptide | Prostate cancer; cell proliferation |
| PELI1 (3) | pellino homolog 1 (Drosophila) | role in interleukin-1-mediated signaling through interaction with interleukin-1 receptor-associated kinase 4-IRAK-tumor necrosis factor receptor-associated factor 6 complex [ |
| PERLD1 (5) | per1-like domain containing 1 | gastric cancer [ |
| PRR11 (-0.32) | proline rich 11 | interact with E2F1, E2F4 |
| PTTG2 (-0.33) | pituitary tumor-transforming 2 | chromosome organization and biogenesis |
| QSOX2 (-0.33) | quiescin Q6 sulfhydryl oxidase 2 | oxidoreductase activity; cell redox homeostasis |
| TROAP (-0.32) | trophinin associated protein (tastin) | cell adhesion |
The numbers in the parenthesis are either the number of times the gene appear in top/bottom 50 LA pairs or the correlation coefficient. Only genes with negative correlations are selected because positive correlations have smaller absolute value. The highest positive correlation is 0.2875, which is ranked as 31st by absolute values.
Figure 1Underlying direction revealed by censorSIR. The first projection direction identified by censor SIR for 22 gene expression profiles versus survival time. The left panel shows the projection weights on each of the 22 gene expression profiles, i.e., the eigenvector corresponding to the biggest eigenvalue. Notice we normalize the eigenvector βso that is equal to 1, i.e. The right panel shows the scatter plot between projection direction and survival time.
Figure 2The Kaplan-Meier estimates of survival rates. Survival rates are estimated for the two groups of patients of sizes 148 and 147 based on the expression of the selected gene signature. The log-rank test comparing the two curves gives a p-value of 2e - 13.