| Literature DB >> 31405076 |
Shuaichao Wang1, Mengyun Wu2, Shuangge Ma3.
Abstract
Prognosis modeling plays an important role in cancer studies. With the development of omics profiling, extensive research has been conducted to search for prognostic markers for various cancer types. However, many of the existing studies share a common limitation by only focusing on a single cancer type and suffering from a lack of sufficient information. With potential molecular similarity across cancer types, one cancer type may contain information useful for the analysis of other types. The integration of multiple cancer types may facilitate information borrowing so as to more comprehensively and more accurately describe prognosis. In this study, we conduct marginal and joint integrative analysis of multiple cancer types, effectively introducing integration in the discovery process. For accommodating high dimensionality and identifying relevant markers, we adopt the advanced penalization technique which has a solid statistical ground. Gene expression data on nine cancer types from The Cancer Genome Atlas (TCGA) are analyzed, leading to biologically sensible findings that are different from the alternatives. Overall, this study provides a novel venue for cancer prognosis modeling by integrating multiple cancer types.Entities:
Keywords: integrative analysis; multiple cancer types; omics data; prognosis modeling
Mesh:
Substances:
Year: 2019 PMID: 31405076 PMCID: PMC6727084 DOI: 10.3390/genes10080604
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Summary information of the nine cancer types.
| Cancer Type | Abbreviation | Sample Size | Non-Censored | Overall Survival (Month) | Median Survival |
|---|---|---|---|---|---|
| Breast invasive carcinoma | BRCA | 802 | 119 | 0.03–282.69 | 29.88 |
| Bladder Urothelial Carcinoma | BLCA | 409 | 180 | 0.43–165.90 | 17.61 |
| Glioblastoma multiforme | GBM | 541 | 417 | 0.10–127.60 | 10.70 |
| Head and Neck squamous cell carcinoma | HNSC | 159 | 69 | 0.07–135.19 | 12.48 |
| Acute Myeloid Leukemia | LAML | 199 | 132 | 0.10–118.10 | 17.00 |
| Lung adenocarcinoma | LUAD | 509 | 183 | 0.13–238.11 | 21.62 |
| Lung squamous cell carcinoma | LUSC | 497 | 215 | 0.03–173.69 | 21.91 |
| Ovarian serous cystadenocarcinoma | OV | 582 | 384 | 0.26–180.06 | 33.03 |
| Pancreatic adenocarcinoma | PAAD | 184 | 100 | 0.13–90.05 | 15.34 |
Figure 1Flowchart of the proposed integrative analysis of The Cancer Genome Atlas (TCGA) data.
Marginal analysis: top five genes with the largest numbers of associated cancer types.
| Approach | Gene | Number of Associated Cancer Types |
|---|---|---|
| A1 |
| 9 |
|
| 8 | |
|
| 8 | |
|
| 8 | |
|
| 7 | |
| A2 |
| 9 |
|
| 8 | |
|
| 8 | |
|
| 7 | |
|
| 7 | |
| A3 |
| 8 |
|
| 7 | |
|
| 7 | |
|
| 7 | |
|
| 7 |
Marginal analysis: relative overlapping between different cancer types.
| Approach | BRCA | GBM | HNSC | LAML | LUAD | LUSC | OV | PAAD | |
|---|---|---|---|---|---|---|---|---|---|
| A1 | BLCA | 0.134 | 0.145 | 0.135 | 0.147 | 0.189 | 0.070 | 0.145 | 0.191 |
| BRCA | 0.167 | 0.120 | 0.145 | 0.148 | 0.072 | 0.150 | 0.119 | ||
| GBM | 0.149 | 0.169 | 0.215 | 0.089 | 0.208 | 0.119 | |||
| HNSC | 0.202 | 0.199 | 0.108 | 0.154 | 0.160 | ||||
| LAML | 0.144 | 0.140 | 0.134 | 0.165 | |||||
| LUAD | 0.102 | 0.141 | 0.173 | ||||||
| LUSC | 0.114 | 0.058 | |||||||
| OV | 0.117 | ||||||||
| A2 | BLCA | 0.314 | 0.267 | 0.320 | 0.295 | 0.351 | 0.161 | 0.185 | 0.337 |
| BRCA | 0.272 | 0.330 | 0.306 | 0.234 | 0.262 | 0.265 | 0.341 | ||
| GBM | 0.416 | 0.443 | 0.271 | 0.310 | 0.364 | 0.237 | |||
| HNSC | 0.346 | 0.380 | 0.253 | 0.366 | 0.430 | ||||
| LAML | 0.337 | 0.398 | 0.302 | 0.338 | |||||
| LUAD | 0.241 | 0.298 | 0.339 | ||||||
| LUSC | 0.292 | 0.201 | |||||||
| OV | 0.286 | ||||||||
| A3 | BLCA | 0.252 | 0.082 | 0.168 | 0.102 | 0.196 | 0.252 | 0.124 | 0.162 |
| BRCA | 0.091 | 0.245 | 0.101 | 0.255 | 0.364 | 0.146 | 0.191 | ||
| GBM | 0.052 | 0.055 | 0.065 | 0.069 | 0.060 | 0.071 | |||
| HNSC | 0.116 | 0.198 | 0.251 | 0.096 | 0.162 | ||||
| LAML | 0.105 | 0.108 | 0.135 | 0.074 | |||||
| LUAD | 0.227 | 0.115 | 0.176 | ||||||
| LUSC | 0.134 | 0.197 | |||||||
| OV | 0.081 |
Marginal analysis: relative Euclidean distances between estimated coefficient matrices.
| Approach | BRCA | GBM | HNSC | LAML | LUAD | LUSC | OV | PAAD | |
|---|---|---|---|---|---|---|---|---|---|
| A1 | BLCA | 1.426 | 1.465 | 1.572 | 1.422 | 1.277 | 2.160 | 1.441 | 1.318 |
| BRCA | 1.270 | 1.853 | 1.551 | 1.457 | 2.445 | 1.443 | 1.584 | ||
| GBM | 1.974 | 1.658 | 1.389 | 2.722 | 1.362 | 1.766 | |||
| HNSC | 1.205 | 1.424 | 1.644 | 1.530 | 1.382 | ||||
| LAML | 1.403 | 1.591 | 1.471 | 1.373 | |||||
| LUAD | 1.960 | 1.463 | 1.376 | ||||||
| LUSC | 1.942 | 1.959 | |||||||
| OV | 1.532 | ||||||||
| A2 | BLCA | 1.166 | 1.250 | 1.435 | 1.378 | 1.075 | 2.814 | 1.424 | 1.070 |
| BRCA | 1.202 | 1.585 | 1.482 | 1.356 | 2.800 | 1.220 | 1.111 | ||
| GBM | 1.384 | 1.176 | 1.293 | 2.746 | 1.028 | 1.307 | |||
| HNSC | 1.236 | 1.288 | 1.855 | 1.465 | 1.118 | ||||
| LAML | 1.269 | 1.764 | 1.457 | 1.252 | |||||
| LUAD | 2.347 | 1.337 | 1.160 | ||||||
| LUSC | 2.512 | 2.658 | |||||||
| OV | 1.205 | ||||||||
| A3 | BLCA | 2.354 | 2.162 | 1.896 | 2.029 | 2.217 | 2.752 | 2.203 | 2.099 |
| BRCA | 2.862 | 2.364 | 2.514 | 1.974 | 1.870 | 3.230 | 1.956 | ||
| GBM | 2.108 | 1.929 | 2.832 | 2.731 | 1.835 | 2.613 | |||
| HNSC | 1.985 | 2.151 | 2.577 | 2.237 | 1.916 | ||||
| LAML | 2.455 | 2.405 | 2.221 | 2.207 | |||||
| LUAD | 2.019 | 3.100 | 1.988 | ||||||
| LUSC | 3.154 | 2.206 | |||||||
| OV | 2.871 |
Figure A1Marginal analysis: clustering dendrogram based on the relative Euclidean distances.
Joint analysis: top five genes with the largest numbers of associated cancer types.
| Approach | Gene | Number of Associated Cancer Types |
|---|---|---|
| B1 | ETV6 | 6 |
| GOT1 | 6 | |
| CHIC2 | 5 | |
| CSNK2A1 | 5 | |
| RUNX2 | 5 | |
| B2 | APH1A | 9 |
| CCAR2 | 9 | |
| HIST1H2AL | 9 | |
| BTLA | 8 | |
| LAMA1 | 8 | |
| B3 | EPO | 4 |
| FASLG | 4 | |
| WDR18 | 4 | |
| CCND2 | 3 | |
| CRADD | 3 |
Joint analysis: relative overlapping between different cancer types.
| Approach | BRCA | GBM | HNSC | LAML | LUAD | LUSC | OV | PAAD | |
|---|---|---|---|---|---|---|---|---|---|
| B1 | BLCA | 0.075 | 0.087 | 0.114 | 0.124 | 0.116 | 0.126 | 0.087 | 0.105 |
| BRCA | 0.068 | 0.069 | 0.152 | 0.086 | 0.082 | 0.099 | 0.074 | ||
| GBM | 0.138 | 0.129 | 0.085 | 0.106 | 0.089 | 0.114 | |||
| HNSC | 0.114 | 0.121 | 0.086 | 0.112 | 0.071 | ||||
| LAML | 0.137 | 0.11 | 0.129 | 0.088 | |||||
| LUAD | 0.098 | 0.114 | 0.083 | ||||||
| LUSC | 0.127 | 0.097 | |||||||
| OV | 0.091 | ||||||||
| B2 | BLCA | 0.097 | 0.085 | 0.090 | 0.128 | 0.124 | 0.121 | 0.124 | 0.107 |
| BRCA | 0.095 | 0.088 | 0.104 | 0.091 | 0.102 | 0.133 | 0.116 | ||
| GBM | 0.101 | 0.113 | 0.071 | 0.109 | 0.124 | 0.127 | |||
| HNSC | 0.130 | 0.090 | 0.101 | 0.124 | 0.109 | ||||
| LAML | 0.098 | 0.102 | 0.138 | 0.09 | |||||
| LUAD | 0.119 | 0.097 | 0.089 | ||||||
| LUSC | 0.115 | 0.114 | |||||||
| OV | 0.132 | ||||||||
| B3 | BLCA | 0.034 | 0.026 | 0.01 | 0.024 | 0.02 | 0.046 | 0.039 | 0.038 |
| BRCA | 0.012 | 0.014 | 0.011 | 0.026 | 0.016 | 0.041 | 0.027 | ||
| GBM | 0.015 | 0.033 | 0.008 | 0.042 | 0.068 | 0.000 | |||
| HNSC | 0.000 | 0.000 | 0.038 | 0.018 | 0.017 | ||||
| LAML | 0.084 | 0.015 | 0.03 | 0.063 | |||||
| LUAD | 0.039 | 0.052 | 0.028 | ||||||
| LUSC | 0.057 | 0.018 | |||||||
| OV | 0.073 |
Joint analysis: relative Euclidean distances between estimated coefficient matrices.
| Approach | BRCA | GBM | HNSC | LAML | LUAD | LUSC | OV | PAAD | |
|---|---|---|---|---|---|---|---|---|---|
| B1 | BLCA | 2.108 | 2.090 | 2.503 | 1.943 | 1.994 | 2.104 | 2.122 | 2.081 |
| BRCA | 2.262 | 2.164 | 2.063 | 2.04 | 2.787 | 2.474 | 1.949 | ||
| GBM | 2.454 | 2.082 | 2.002 | 2.266 | 2.001 | 2.230 | |||
| HNSC | 2.331 | 2.538 | 3.571 | 2.846 | 1.960 | ||||
| LAML | 2.047 | 2.383 | 2.114 | 2.079 | |||||
| LUAD | 2.250 | 1.958 | 2.147 | ||||||
| LUSC | 2.093 | 2.878 | |||||||
| OV | 2.481 | ||||||||
| B2 | BLCA | 1.983 | 1.931 | 1.973 | 1.794 | 1.807 | 2.122 | 1.908 | 1.771 |
| BRCA | 1.928 | 1.891 | 1.963 | 2.063 | 2.423 | 2.093 | 1.906 | ||
| GBM | 1.838 | 1.890 | 1.921 | 1.986 | 1.967 | 1.875 | |||
| HNSC | 1.965 | 2.098 | 2.371 | 2.095 | 1.832 | ||||
| LAML | 1.843 | 1.889 | 1.866 | 1.940 | |||||
| LUAD | 2.012 | 1.953 | 1.880 | ||||||
| LUSC | 2.064 | 2.351 | |||||||
| OV | 2.071 | ||||||||
| B3 | BLCA | 3.664 | 2.176 | 1.992 | 2.251 | 2.052 | 3.432 | 2.185 | 2.049 |
| BRCA | 2.672 | 3.223 | 2.528 | 3.074 | 1.994 | 2.672 | 3.829 | ||
| GBM | 2.049 | 2.029 | 2.029 | 2.759 | 2.017 | 2.219 | |||
| HNSC | 2.124 | 2.004 | 3.088 | 2.099 | 2.040 | ||||
| LAML | 1.994 | 2.455 | 1.978 | 1.907 | |||||
| LUAD | 2.998 | 1.983 | 2.101 | ||||||
| LUSC | 2.720 | 3.722 | |||||||
| OV | 2.421 |
Figure A2Joint analysis: clustering dendrogram based on the relative Euclidean distances.
Joint analysis: prediction performance of different approaches (mean C-statistic).
| BLCA | BRCA | GBM | HNSC | LAML | LUAD | LUSC | OV | PAAD | |
|---|---|---|---|---|---|---|---|---|---|
| B1 | 0.665 | 0.876 | 0.604 | 0.641 | 0.573 | 0.688 | 0.748 | 0.577 | 0.689 |
| B2 | 0.597 | 0.719 | 0.581 | 0.567 | 0.551 | 0.601 | 0.649 | 0.562 | 0.632 |
| B3 | 0.587 | 0.693 | 0.558 | 0.604 | 0.558 | 0.594 | 0.612 | 0.547 | 0.589 |
Data-based simulation: average true positive rates (TPRs) and false positive rates (FPRs) of different approaches, and numbers of identified true positives associated with all nine cancer types (NG).
| p | Scenario | A1 | A2 | A3 | B1 | B2 | B3 | |
|---|---|---|---|---|---|---|---|---|
| 200 | I | TPR | 0.980 | 0.951 | 0.944 | 0.838 | 0.880 | 0.688 |
| FPR | 0.258 | 0.185 | 0.641 | 0.087 | 0.085 | 0.200 | ||
| NG | 7.0 | 8.4 | 3.8 | 5.7 | 8.8 | 1.4 | ||
| II | TPR | 0.697 | 0.681 | 0.678 | 0.735 | 0.691 | 0.533 | |
| FPR | 0.263 | 0.172 | 0.537 | 0.231 | 0.169 | 0.347 | ||
| NG | 4.4 | 3.7 | 0.4 | 4.6 | 4.0 | 0.0 | ||
| III | TPR | 0.841 | 0.801 | 0.752 | 0.821 | 0.813 | 0.565 | |
| FPR | 0.258 | 0.297 | 0.303 | 0.312 | 0.321 | 0.422 | ||
| NG | 7.0 | 6.0 | 5.7 | 5.6 | 6.3 | 1.4 | ||
| IV | TPR | 0.843 | 0.741 | 0.621 | 0.897 | 0.766 | 0.662 | |
| FPR | 0.124 | 0.176 | 0.195 | 0.072 | 0.053 | 0.052 | ||
| NG | 3.3 | 2.3 | 0.0 | 5.0 | 3.0 | 2.1 | ||
| 500 | I | TPR | 0.922 | 0.911 | 0.844 | 0.933 | 0.844 | 0.688 |
| FPR | 0.248 | 0.152 | 0.452 | 0.114 | 0.122 | 0.173 | ||
| NG | 5.0 | 8.0 | 3.0 | 5.0 | 5.0 | 0.0 | ||
| II | TPR | 0.672 | 0.664 | 0.653 | 0.647 | 0.643 | 0.647 | |
| FPR | 0.191 | 0.171 | 0.165 | 0.025 | 0.063 | 0.128 | ||
| NG | 4.7 | 2.8 | 0.4 | 3.8 | 3.1 | 0.0 | ||
| III | TPR | 0.774 | 0.723 | 0.445 | 0.811 | 0.784 | 0.644 | |
| FPR | 0.173 | 0.160 | 0.107 | 0.173 | 0.053 | 0.181 | ||
| NG | 4.0 | 6.3 | 0.0 | 6.2 | 4.8 | 1.2 | ||
| IV | TPR | 0.822 | 0.678 | 0.617 | 0.864 | 0.719 | 0.646 | |
| FPR | 0.058 | 0.054 | 0.116 | 0.042 | 0.046 | 0.038 | ||
| NG | 4.6 | 2.6 | 0.0 | 5.0 | 3.2 | 1.8 | ||
| 1000 | I | TPR | 0.733 | 0.722 | 0.623 | 0.622 | 0.688 | 0.591 |
| FPR | 0.198 | 0.173 | 0.350 | 0.001 | 0.056 | 0.064 | ||
| NG | 5.0 | 6.0 | 3.0 | 3.0 | 3.0 | 0.0 | ||
| II | TPR | 0.674 | 0.643 | 0.622 | 0.689 | 0.689 | 0.611 | |
| FPR | 0.161 | 0.075 | 0.136 | 0.011 | 0.108 | 0.061 | ||
| NG | 2.1 | 4.0 | 0.0 | 2.0 | 3.0 | 0.0 | ||
| III | TPR | 0.664 | 0.667 | 0.624 | 0.692 | 0.677 | 0.564 | |
| FPR | 0.038 | 0.069 | 0.297 | 0.096 | 0.043 | 0.076 | ||
| NG | 4.0 | 6.4 | 0.6 | 5.2 | 5.4 | 0.4 | ||
| IV | TPR | 0.722 | 0.644 | 0.622 | 0.855 | 0.711 | 0.699 | |
| FPR | 0.093 | 0.100 | 0.136 | 0.016 | 0.015 | 0.009 | ||
| NG | 5.0 | 5.0 | 0.0 | 5.0 | 3.0 | 2.0 |