| Literature DB >> 19356222 |
Anneleen Daemen1, Olivier Gevaert, Fabian Ojeda, Annelies Debucquoy, Johan Ak Suykens, Christine Sempoux, Jean-Pascal Machiels, Karin Haustermans, Bart De Moor.
Abstract
BACKGROUND: Although microarray technology allows the investigation of the transcriptomic make-up of a tumor in one experiment, the transcriptome does not completely reflect the underlying biology due to alternative splicing, post-translational modifications, as well as the influence of pathological conditions (for example, cancer) on transcription and translation. This increases the importance of fusing more than one source of genome-wide data, such as the genome, transcriptome, proteome, and epigenome. The current increase in the amount of available omics data emphasizes the need for a methodological integration framework.Entities:
Year: 2009 PMID: 19356222 PMCID: PMC2684660 DOI: 10.1186/gm39
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Overview of the two case studies on rectal and prostate cancer
| Data set I: rectal cancer | Data set II: prostate cancer | |
|---|---|---|
| Number of samples | 36 | 55 |
| Data sources | Microarray | Microarray |
| Proteomics | Genomics | |
| Number of features (after preprocessing) | 6,974 genes | |
| 7,305 CNVs | ||
| Outcomes | WHEELER | GRADE |
| pN-STAGE | STAGE | |
| CRM | METASTASIS | |
| RECURRENCE |
Figure 1Overview of the three applied model building strategies. (a) Use of a single data set; (b) manual integration of data over time; (c) a genome-wide integration approach. The data sets are represented as matrices with rows corresponding to patients and columns corresponding to genes, proteins, or CNVs. In step A, LS-SVM models are built on each data set separately. A two-dimensional grid is used for the optimization of the regularization parameter and the number of features. For step B, data sets over time are combined. By using the changes in expression or abundance as features, a two-dimensional grid is suficient. In step C, an intermediate integration method is used for the integration of all available data sets. A k-dimensional grid is required for optimizing the regularization parameter and the number of features selected from the (k - 1) integrated data sets. FS, feature selection; M, model for parameter combination i; NF, number of features; T, time point.
LS-SVM models for the prediction of WHEELER, pN-STAGE and CRM in rectal cancer
| Outcome | Model | NG* | NP† | AUC (SE)‡ | |
|---|---|---|---|---|---|
| A | 4 | 0.7538 (0.1085) | 0.0987 | ||
| 29 | 0.9038 (0.0502) | 0.6861 | |||
| 35 | 0.7423 (0.0867) | 0.0540 | |||
| 11 | 0.9038 (0.0575) | 0.7273 | |||
| B | 32 | 0.6846 (0.1215) | 0.0598 | ||
| 5 | 0.8654 (0.0621) | 0.4135 | |||
| C | 3¶ | 0.7808 (0.0985) | 0.1320 | ||
| 21¶ | 0.7692 (0.0831) | 0.0831 | |||
| 3 | 35 | 0.8461 (0.0718) | 0.2760 | ||
| 2¶ | 31¶ | 0.8846 (0.0558) | 0.4858 | ||
| 2 | 4 | 0.9385 (0.0444) | 0.8101¥ | ||
| A | 25 | 0.6493 (0.0914) | 2.315e-4 | ||
| 22 | 0.8506 (0.0665) | 0.0362 | |||
| 2 | 0.6753 (0.0906) | 6.659e-4 | |||
| 12 | 0.8409 (0.0652) | 0.0238 | |||
| B | 4 | 0.6071 (0.0986) | 1.359e-4 | ||
| 9 | 0.7662 (0.0900) | 0.0153 | |||
| C | 24¶ | 0.9286 (0.0450) | 0.1998 | ||
| 34¶ | 0.8182 (0.0695) | 0.0145 | |||
| 27 | 27 | 0.9188 (0.0469) | 0.1591 | ||
| 23¶ | 16¶ | 0.9610 (0.0280) | 0.3421 | ||
| 26 | 20¶ | 1 (0) | 0.3347¥ | ||
| A | 33 | 0.6790 (0.1016) | 0.0072 | ||
| 9 | 0.9259 (0.0472) | 0.4955 | |||
| 34 | 0.8518 (0.0624) | 0.0935 | |||
| 34 | 0.7654 (0.0831) | 0.0281 | |||
| B | 6 | 0.9136 (0.0480) | 0.4030 | ||
| 2 | 0.8272 (0.0709) | 0.0849 | |||
| C | 16¶ | 0.8066 (0.0846) | 0.0468 | ||
| 3¶ | 0.7531 (0.0865) | 0.0227 | |||
| 7 | 27 | 0.8477 (0.0688) | 0.1340 | ||
| 2¶ | 3¶ | 0.8230 (0.0771) | 0.0973 | ||
| 16 | 14 | 0.9630 (0.0376) | 1 | ||
| 9¶ | 29 | 0.9876 (0.0146) | 0.4924¥ |
*Number of genes selected in each LOO iteration. †Number of proteins selected in each LOO iteration. ‡Area under the ROC curve (standard error) obtained with leave-one-out. §Comparison of AUC between each model and the best model in bold [46]. ¶Number of features used at both time points. ¥This model is better than the model in bold we compare with.
Features for (colo)rectal cancer selected by MPT1 and known to be involved in this type of cancer
| Outcome* | Gene/protein | Hits† | Region | Function | Up/down‡ | Reference |
|---|---|---|---|---|---|---|
| W | Cox-2 | 36 | 1q25.2-q25.3 | Progression | Up | [ |
| W | IL-1B | 36 | 2q14 | Inflammatory response | Up | [ |
| W | Ferritin | 36 | 11q13; 19q13.3-q13.4 | Iron storage | Down | [ |
| W | EGF | 36 | 4q25 | Cell growth/proliferation/differentiation | Up | [ |
| W | MMP-2 | 36 | 16q13-q21 | Invasion/metastasis | Up | [ |
| W | TGF | 36 | 2p13 | Angiogenesis/cell proliferation | Down | [ |
| W | SELE | 25 | 1q22-q25 | Progression/metastasis | Up | [ |
| W | GM-CSF | 24 | 5q31.1 | Maintenance of granulocytes/macrophages | Up | [ |
| W | MMP-1 | 15 | 11q22.3 | Tumor invasion/metastasis/poor prognosis | Up | [ |
| N | Reg4 | 36 | 1p13.1-p12 | Early carcinogenesis | Down | [ |
| N | MUC2 | 36 | 11p15.5 | Deregulated by TNFα | Down | [ |
| N | CA1 | 36 | 8q13-q22.1 | Carbonate dehydratase activity | Down | [ |
| N | CA2 | 36 | 8q22 | Carbonate dehydratase activity | Down | [ |
| N | CLDN8 | 36 | 21q22.11 | Tumorigenesis | Down | [ |
| N | CEA | 36 | 19q13.1-q13.2 | Cell adhesion; tumor marker for recurrence | Down | [ |
| N | IL-1ra | 36 | 2q14.2 | Carcinogenesis | Up | [ |
| N | CA19-9 | 36 | Tumor marker for recurrence | Down | [ | |
| N | Ferritin | 36 | 11q13; 19q13.3-q13.4 | Iron storage | Down | [ |
| N | IL-1beta | 36 | 2q14 | Inflammatory response | Down | [ |
| N | beta2-microglobulin | 36 | 15q21-q22.2 | Metastasis | Up | [ |
| N | RARRES1 | 31 | 3q25.32-q25.33 | Cell proliferation | Down | [ |
| N | IL-8 | 28 | 4q13-q21 | Progression/metastasis | Down | [ |
| N | TNFRII | 24 | 1p36.3-p36.2 | Apoptosis | Up | [ |
| C | ICAM-1 | 36 | 19p13.3-p13.2 | Metastasis | Down | [ |
| C | CEA | 36 | 19q13.1-q13.2 | Cell adhesion; tumor marker for recurrence | Down | [ |
| C | MMP-2 | 36 | 16q13-q21 | Invasion/metastasis | Up | [ |
| C | Adiponectin | 36 | 3q27 | Metabolic/hormonal processes | Down | [ |
| C | Thrombospondin-1 | 36 | 15q15 | Angiogenesis/tumor growth | Up | [ |
| C | EGFR | 36 | 7p12 | Cell growth/proliferation/differentiation | Up | [ |
| C | Tissue factor | 35 | 1p22-p21 | Angiogenesis/metastasis | Up | [ |
| C | CYP1B1 | 35 | 2p21 | Drug metabolism | Down | [ |
| C | EGF | 32 | 4q25 | Cell growth/proliferation/differentiation | Up | [ |
*W, WHEELER; N, pN-STAGE; C, CRM. †Number of occurrences of the gene/protein in the 36 LOO iterations. ‡Up/down-regulation in the good responders with respect to moderate or poor responders; no lymph nodes with respect to at least one regional lymph node; negative CRM with respect to positive CRM. CRC, (colo)rectal cancer.
LS-SVM models for the prediction of GRADE, STAGE, METASTASIS and RECURRENCE in prostate cancer
| Outcome | Model | NG* | NC† | AUC (SE)‡ | |
|---|---|---|---|---|---|
| A | 24 | 0.8304 (0.0623) | 0.2727 | ||
| 8 | 0.7822 (0.0632) | 0.0503 | |||
| C | |||||
| A | 18 | 0.6576 (0.0778) | 0.0191 | ||
| 32 | 0.7936 (0.0631) | 0.3466 | |||
| C | |||||
| A | 18 | 0.9759 (0.0178) | 0.4392 | ||
| 12 | 0.8114 (0.0755) | 0.0166 | |||
| C | |||||
| A | 24 | 0.7208 (0.0936) | 0.5392 | ||
| 26 | 0.4481 (0.1433) | 0.0354 | |||
| C |
*Number of genes selected in each LOO iteration. †Number of copy number variations selected in each LOO iteration. ‡Area under the ROC curve (standard error) obtained with leave-one-out. §Comparison of AUC between each model and the best model in bold [46].
Features for prostate cancer selected by MG and known to be involved in this type of cancer
| Outcome* | Gene/CNV | Hits† | Region | Function | Up/down‡ | Reference |
|---|---|---|---|---|---|---|
| G | 55 | 7p14.1 | Inhibitor of PT growth/invasion | Up | [ | |
| G | 55 | 5q14.3 | Contributor to PC pathology | Up | [ | |
| G | 36 | 17p13.1 | Suppressor of PT development | Down | [ | |
| S | 50 | Xq28 | Only expressed in PC (diagnosis and therapy) | Down | [ | |
| S | 50 | 15q25-q26 | PT cell invasion | Down | [ | |
| S | 50 | 13q31.1 | PC cell growth | Down | [ | |
| S | 48 | 5q31 | Inhibitor of PT growth | Up | [ | |
| S | 48 | 1q25 | Polymorphic changes as tumor; suppressor in hereditary PC | Up | [ | |
| S | 41 | 4q21.1 | Prostate-specific gene | Down | [ | |
| M | 50 | 21q22.3 | Proto-oncogene; early prostate carcinogenesis | Up | [ | |
| M | 49 | 4q13-q21 | PC progression/growth via TARP | Down | [ | |
| M | 49 | 1p13.3 | Oncogene; PC development/progression | Up | [ | |
| M | 26 | 21q21.2 | Negatively affected by TGFbeta1, which increases VCAN-expression | Down | [ | |
| R | 29 | 7q22.1 | Inversely associated to tumor stage; predictor of biochemical recurrence | Down | [ | |
| R | 29 | 21q22.1-11 | Predictor of decreased disease-free survival/recurrence | Up | [ | |
| R | 28 | 4q28 | PC cell growth | Down | [ | |
| R | 26 | 1q32.3 | Inversely related to invasion/angiogenesis; positively correlated to metastases | Down | [ | |
| R | 26 | 20p12.1-11.23 | Cell growth/progression/metastasis | Up | [ | |
| R | 14 | 21q22.3 | Proto-oncogene; early prostate carcinogenesis | Up | [ | |
| R | 14 | 17p13.1 | Suppressor of PT development | Down | [ |
*G, GRADE; S, STAGE; M, METASTASIS; R, RECURRENCE. †Number of occurrences of the gene/CNV in all LOO iterations (number of LOO iterations for G = 55, S = 50, M = 50, R = 29). ‡Up/down-regulation in high-grade with respect to low-grade; advanced stage with respect to early stage; metastasis with respect to no metastasis; recurrence with respect to no recurrence. PC, prostate cancer; PT, prostate tumor.
Comparison of our kernel-based integration approach with the ensemble approach
| Outcome | AUC (SE)*: | AUC (SE)*: ensemble approach | |
|---|---|---|---|
| WHEELER | 0.9269 (0.0425) | 0.9500 (0.0339) | 0.6160 |
| pN-STAGE | 0.9870 (0.0135) | 0.9253 (0.0432) | 0.1422 |
| CRM | 0.9630 (0.0344) | 0.7860 (0.0783) | |
| GRADE | 0.9006 (0.0413) | 0.8567 (0.0521) | 0.3745 |
| STAGE | 0.8528 (0.0550) | 0.8304 (0.0582) | 0.6836 |
| METASTASIS | 0.9868 (0.0121) | 0.9452 (0.0309) | 0.1313 |
| RECURRENCE | 0.7857 (0.0934) | 0.4545 (0.1352) |
*Area under the ROC curve (standard error) obtained with leave-one-out. †Comparison in AUC between the best models obtained with our strategy (MPT1 for rectal cancer, MG for prostate cancer) and the corresponding ensemble models based on the same number of features [46]