| Literature DB >> 32033604 |
Danyang Tong1, Yu Tian1, Tianshu Zhou1, Qiancheng Ye1, Jun Li2, Kefeng Ding2, Jingsong Li3,4.
Abstract
BACKGROUND: Colon cancer is common worldwide and is the leading cause of cancer-related death. Multiple levels of omics data are available due to the development of sequencing technologies. In this study, we proposed an integrative prognostic model for colon cancer based on the integration of clinical and multi-omics data.Entities:
Keywords: Colon cancer; Integrative analysis; Multi-omics study; Prognostic prediction; The Cancer genome atlas (TCGA)
Mesh:
Substances:
Year: 2020 PMID: 32033604 PMCID: PMC7006213 DOI: 10.1186/s12911-020-1043-1
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Flowchart of data processing for the TCGA-COAD dataset
Feature statistics of the clinical data used in the prognostic analysis
| Features | Statistics | |
|---|---|---|
| Cases with Clinical and Omics Data | 344 | |
| Gender: | Male | 182 (52.9%) |
| Female | 162 (47.1%) | |
| Survival Status: | Alive | 273 (79.4%) |
| Dead | 71 (20.6%) | |
| Survival Time: | Mean | 779.8 (days) |
| Median | 575.5 (days) | |
| T Stage: | T1 | 10 (2.91%) |
| T2 | 62 (18.02%) | |
| T3 | 254 (73.84%) | |
| T4a | 12 (3.49%) | |
| T4b | 6 (1.74%) | |
| N Stage: | N0 | 211 (61.3%) |
| N1a | 35 (10.2%) | |
| N1b | 40 (11.6%) | |
| N2a | 32 (9.3%) | |
| N2b | 26 (7.6%) | |
| M Stage: | M0 | 292 (84.9%) |
| M1 | 52 (15.1%) | |
| Age at Initial Diagnosis: | Basic Statistics (years) | Min: 31, Median: 69, Mean: 66, Max: 90 |
| 31–59 | 86 (25.0%) | |
| 59–69 | 85 (24.7%) | |
| 69–77 | 82 (23.8%) | |
| 77–90 | 91 (26.5%) |
Information regarding the omics data used in the prognostic analysis
| Features | Statistics or Description |
|---|---|
| Gene Expression | |
| Platform | Illumina Genome Analyzer RNA Sequencing |
| Reference Genome | GRCh38 |
| Measurement | FPKM normalized value |
| Number of Features | 4691 |
| DNA Methylation | |
| Platform | Illumina Infinium Human Methylation 27 (HM27) and Human Methylation 450 (HM450) |
| Reference Genome | GRCh38 |
| Measurement | Beta value |
| Number of Features | 4441 |
| miRNA Expression | |
| Platform | Illumina Genome Analyzer miRNA Sequencing |
| Reference Annotation | miRBase v21 and UCSC |
| Measurement | RPM |
| Number of Features | 111 |
Fig. 2Pipeline of the integration of the clinical data and multi-omics data for the prognostic analysis
Cluster parameters selected for different types of omics data in prognostic models with different covariates
| Covariates | Gene Expression | DNA Methylation | miRNA Expression | |
|---|---|---|---|---|
| Gene Expression | Distance Method: | Canberra | ||
| Linkage Method: | Ward.D | |||
| Cluster Number: | 6 | |||
| DNA Methylation | Distance Method: | Maximum | ||
| Linkage Method: | Ward.D | |||
| Cluster Number: | 10 | |||
| miRNA Expression | Distance Method: | Maximum | ||
| Linkage Method: | Ward.D2 | |||
| Cluster Number: | 4 | |||
| Clinical and Gene Expression | Distance Method: | Manhattan | ||
| Linkage Method: | Ward.D | |||
| Cluster Number: | 4 | |||
| Clinical and DNA Methylation | Distance Method: | Canberra | ||
| Linkage Method: | Ward.D | |||
| Cluster Number: | 3 | |||
| Clinical and miRNA Expression | Distance Method: | Canberra | ||
| Linkage Method: | Ward.D | |||
| Cluster Number: | 3 | |||
| Clinical and Gene Expression and DNA Methylation | Distance Method: | Manhattan | Correlation | |
| Linkage Method: | Ward.D | Ward.D2 | ||
| Cluster Number: | 4 | 3 | ||
| Clinical and Gene Expression and miRNA Expression | Distance Method: | Manhattan | Manhattan | |
| Linkage Method: | Ward.D | Ward.D | ||
| Cluster Number: | 4 | 4 | ||
| Clinical and DNA Methylation and miRNA Expression | Distance Method: | Maximum | Canberra | |
| Linkage Method: | Ward.D | Ward.D | ||
| Cluster Number: | 10 | 3 | ||
| Clinical and All Three Types of Omics Data | Distance Method: | Manhattan | Maximum | Manhattan |
| Linkage Method: | Ward.D | Ward.D | Ward.D2 | |
| Cluster Number: | 4 | 10 | 4 |
Fig. 3Performance of prognostic models with different covariates. For the labels used in the figure, the symbol “+” indicates that the covariates were used separately in the model. a Bias-corrected Harrell’s C-index of prognostic models with different covariates with 95% CIs summarized from 500 bootstrapping replicates; b -log10(p-values) of the likelihood ratio test, the score test and the Wald test of Cox models with different covariates; the red dotted line indicates –log10(0.05); c Plot of the p-values of the global PH assumption tests
Regression coefficients of our integrated prognostic model
| Covariate | Coefficient ± SE | HR | 95% CI | P |
|---|---|---|---|---|
| T stage | ||||
| T1 | 1 | |||
| T2 | −3.125 ± 1.253 | 0.0439 | 0.00377–0.513 | 0.0127 |
| T3 | −1.186 ± 0.810 | 0.305 | 0.0624–1.493 | 0.143 |
| T4a | −0.298 ± 1.069 | 0.742 | 0.0913–6.033 | 0.780 |
| T4b | 0.530 ± 1.152 | 1.699 | 0.178–16.241 | 0.646 |
| N stage | ||||
| N0 | 1 | |||
| N1a | 0.266 ± 0.504 | 1.305 | 0.486–3.506 | 0.598 |
| N1b | 0.175 ± 0.450 | 1.191 | 0.493–2.877 | 0.697 |
| N2a | 1.355 ± 0.396 | 3.876 | 1.785–8.416 | 0.0006 |
| N2b | 0.985 ± 0.468 | 2.679 | 1.071–6.703 | 0.0352 |
| M stage | ||||
| M0 | 1 | |||
| M1 | 1.644 ± 0.362 | 5.178 | 2.546–10.532 | 5.64 e-6 |
| Age | ||||
| 31–58 | 1 | |||
| 59–70 | 0.654 ± 0.449 | 1.923 | 0.798–4.635 | 0.145 |
| 70–78 | 1.045 ± 0.411 | 2.842 | 1.269–6.363 | 0.0111 |
| 79–90 | 1.244 ± 0.382 | 3.469 | 1.641–7.333 | 0.0011 |
| Gene Expression | ||||
| Cluster1 | 1 | |||
| Cluster2 | 0.970 ± 0.454 | 2.638 | 1.083–6.429 | 0.0328 |
| Cluster3 | 2.404 ± 0.551 | 11.067 | 3.758–32.591 | 1.28 e-5 |
| Cluster4 | 0.597 ± 1.160 | 1.817 | 0.187–17.641 | 0.606 |
| DNA Methylation | ||||
| Cluster1 | 1 | |||
| Cluster2 | −0.138 ± 0.576 | 0.871 | 0.281–2.695 | 0.811 |
| Cluster3 | 0.764 ± 0.577 | 2.146 | 0.693–6.645 | 0.185 |
| Cluster4 | 0.352 ± 0.465 | 1.423 | 0.572–3.539 | 0.449 |
| Cluster5 | −0.132 ± 0.667 | 0.876 | 0.237–3.239 | 0.843 |
| Cluster6 | −0.895 ± 0.604 | 0.409 | 0.125–1.336 | 0.139 |
| Cluster7 | − 0.397 ± 0.673 | 0.672 | 0.180–2.514 | 0.555 |
| Cluster8 | −1.960 ± 1.128 | 0.141 | 0.0155–1.284 | 0.0822 |
| Cluster9 | −1.848 ± 0.809 | 0.157 | 0.0323–0.769 | 0.0223 |
| Cluster10 | −1.015 ± 0.732 | 0.362 | 0.0863–1.521 | 0.165 |
| miRNA Expression | ||||
| Cluster1 | 1 | |||
| Cluster2 | 0.527 ± 0.341 | 1.693 | 0.867–3.305 | 0.123 |
| Cluster3 | 0.276 ± 0.450 | 1.318 | 0.546–3.182 | 0.539 |
| Cluster4 | −0.669 ± 0.503 | 0.512 | 0.191–1.373 | 0.184 |
Origin concordance: 0.8345; bias-corrected concordance: 0.7604
SE standard error, HR hazard ratio, CI confidence interval
Fig. 4The 2-year, 3-year and 5-year Uno’s C-index of different prognostic models. a 2-year Uno’s C-index of prognostic models with different covariates; b 2-year Uno’s C-index with cross-validation of prognostic models with different covariates; c 3-year Uno’s C-index of prognostic models with different covariates; d 3-year Uno’s C-index with cross-validation of prognostic models with different covariates; e 5-year Uno’s C-index of prognostic models with different covariates; f 5-year Uno’s C-index with cross-validation of prognostic models with different covariates
Difference in discriminative performance between our prognostic model and other models
| Comparison | 2-year ∆C ± 95% CI | 3-year ∆C ± 95% CI | 5-year ∆C ± 95% CI | |
|---|---|---|---|---|
| CGMm vs CMm | 0.0067 ± 0.027 | 0.0010 ± 0.040 | 0.0152 ± 0.033 | 0.00147 |
| CGMm vs CGM | 0.0317 ± 0.033 | 0.0328 ± 0.032 | 0.0480 ± 0.047 | 0.000217 |
| CGMm vs CGm | 0.0284 ± 0.035 | 0.0266 ± 0.035 | 0.0497 ± 0.054 | 0.000582 |
| CGMm vs CA | 0.0349 ± 0.040 | 0.0497 ± 0.040 | 0.0694 ± 0.057 | 0.000312 |
| CGMm vs CG | 0.0315 ± 0.037 | 0.0293 ± 0.031 | 0.0485 ± 0.049 | 0.000203 |
| CGMm vs CM | 0.0402 ± 0.041 | 0.0541 ± 0.042 | 0.0687 ± 0.056 | 6.919 e-7 |
| CGMm vs Cm | 0.0291 ± 0.044 | 0.0324 ± 0.045 | 0.0654 ± 0.060 | 1.528 e-6 |
| CGMm vs C | 0.0374 ± 0.040 | 0.0496 ± 0.042 | 0.0683 ± 0.054 | 2.335 e-6 |
| CGMm vs G | 0.262 ± 0.12 | 0.241 ± 0.13 | 0.268 ± 0.12 | 2.178 e-12 |
| CGMm vs M | 0.143 ± 0.076 | 0.131 ± 0.079 | 0.119 ± 0.069 | 3.426 e-10 |
| CGMm vs m | 0.305 ± 0.10 | 0.259 ± 0.11 | 0.223 ± 0.15 | 7.776 e-13 |
| CMm vs CG | 0.0247 ± 0.042 | 0.0284 ± 0.055 | 0.0333 ± 0.056 | 0.0118 |
| CMm vs Cm | 0.0224 ± 0.035 | 0.0314 ± 0.040 | 0.0502 ± 0.054 | 8.658 e-5 |
| CMm vs C | 0.0307 ± 0.040 | 0.0273 ± 0.052 | 0.0531 ± 0.061 | 0.000130 |
| CG vs C | 0.00591 ± 0.030 | 0.0203 ± 0.032 | 0.0198 ± 0.028 | 0.000689 |
| Cm vs C | 0.00825 ± 0.024 | 0.0173 ± 0.036 | 0.00290 ± 0.029 | 0.270 |
∆C difference in C-index, CI confidence interval, LRT likelihood ratio test;
In the Comparisons column, C stands for clinical, G for gene expression, M for DNA methylation and m for miRNA expression. The words on both sides of vs are the covariates in prognostic model
Wilcoxon signed-rank test of difference in C-index distribution between our prognostic model and other models
| Comparison | ||||
|---|---|---|---|---|
| CGMm vs CMm | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 |
| CGMm vs CGM | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 | 1.091 e-5 |
| CGMm vs CGm | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 | 0.000102 |
| CGMm vs CA | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 | 5.028 e-12 |
| CGMm vs CG | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 |
| CGMm vs CM | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 |
| CGMm vs Cm | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 |
| CGMm vs C | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 |
| CGMm vs G | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 |
| CGMm vs M | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 |
| CGMm vs m | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 |
| CMm vs CG | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 | 0.0209 |
| CMm vs Cm | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 | 0.00161 |
| CMm vs C | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 | 1.452 e-7 |
| CG vs C | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 |
| Cm vs C | 3.413 e-16 | < 2.2 e-16 | 0.00323 | 7.012 e-13 |
In the Comparison column, C stands for clinical, G for gene expression, M for DNA methylation and m for miRNA expression. The words on both sides of vs are the covariates in the prognostic model
Difference in performance of our prognostic model and the model with one covariate removed
| Comparison | 2-year ∆C ± 95% CI | 3-year ∆C ± 95% CI | 5-year ∆C ± 95% CI | |
|---|---|---|---|---|
| Without T stage | −0.00141 ± 0.027 | −0.00135 ± 0.024 | 0.00264 ± 0.021 | 0.0126 |
| Without N stage | 0.0341 ± 0.043 | 0.0332 ± 0.041 | 0.0285 ± 0.037 | 0.00863 |
| Without M stage | 0.0320 ± 0.042 | 0.0203 ± 0.038 | 0.00457 ± 0.031 | 7.755 e-6 |
| Without Age | 0.0121 ± 0.029 | 0.000144 ± 0.034 | 0.0227 ± 0.036 | 0.00688 |
| Without Gene | 0.0108 ± 0.025 | 0.0138 ± 0.024 | 0.0225 ± 0.030 | 0.000133 |
| Without Methylation | 0.0273 ± 0.035 | 0.0260 ± 0.037 | 0.0457 ± 0.052 | 0.000609 |
| Without miRNA | 0.00357 ± 0.011 | 0.00380 ± 0.013 | 0.0068 ± 0.018 | 0.103 |
Fig. 5C-indexes of our prognostic model with one covariate removed. a Harrell’s C-index of our prognostic model with one covariate removed; b Harrell’s C-index with bootstrapping of our prognostic model with one covariate removed; c 2-year Uno’s C-index of our prognostic model with one covariate removed; d 2-year Uno’s C-index with cross-validation of our prognostic model with one covariate removed; e 3-year Uno’s C-index of our prognostic model with one covariate removed; f 3-year Uno’s C-index with cross-validation of our prognostic model with one covariate removed; g 5-year Uno’s C-index of our prognostic model with one covariate removed; h 5-year Uno’s C-index with cross-validation of our prognostic model with one covariate removed
Test of C-index distribution differences between our prognostic model and model with one covariate removed
| Comparison | ||||
|---|---|---|---|---|
| Without T stage | < 2.2 e-16 | < 2.2 e-16 | 2.384 e-6 | < 2.2 e-16 |
| Without N stage | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 |
| Without M stage | < 2.2 e-16 | < 2.2 e-16 | 4.5 e-5 | < 2.2 e-16 |
| Without Age | < 2.2 e-16 | 0.228 | < 2.2 e-16 | < 2.2 e-16 |
| Without Gene | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 |
| Without Methylation | < 2.2 e-16 | < 2.2 e-16 | < 2.2 e-16 | 9.179 e-9 |
| Without miRNA | 0.102 | 0.431 | 0.00527 | < 2.2 e-16 |