| Literature DB >> 35003220 |
Shiyue Tao1, Xiangyu Ye2, Lulu Pan1, Minghan Fu1, Peng Huang2, Zhihang Peng1, Sheng Yang1.
Abstract
Pan-cancer strategy, an integrative analysis of different cancer types, can be used to explain oncogenesis and identify biomarkers using a larger statistical power and robustness. Fine-mapping defines the casual loci, whereas genome-wide association studies (GWASs) typically identify thousands of cancer-related loci and not necessarily have a fine-mapping component. In this study, we develop a novel strategy to identify the causal loci using a pan-cancer and fine-mapping assumption, constructing the CAusal Pan-cancER gene (CAPER) score and validating its performance using internal and external validation on 1,287 individuals and 985 cell lines. Summary statistics of 15 cancer types were used to define 54 causal loci in 15 potential genes. Using the Cancer Genome Atlas (TCGA) training set, we constructed the CAPER score and divided cancer patients into two groups. Using the three validation sets, we found that 19 cancer-related variables were statistically significant between the two CAPER score groups and that 81 drugs had significantly different drug sensitivity between the two CAPER score groups. We hope that our strategies for selecting causal genes and for constructing CAPER score would provide valuable clues for guiding the management of different types of cancers.Entities:
Keywords: fine-mapping; genome-wide association study; pan-cancer; risk estimation; summary statistics
Year: 2021 PMID: 35003220 PMCID: PMC8733729 DOI: 10.3389/fgene.2021.784775
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Workflow for CAPER score construction and its clinical translation.
Summary of GWAS summary statistics in 15 types of cancer.
| Cancer type | No. SNP |
| Samples size | Prev. (/100,000) |
|---|---|---|---|---|
| BLCA | 1,293,985 | 0.08 | 412,592 | 23 |
| BRCA | 1,016,724 | 0.14 | 194,153 | 125.2 |
| CESC | 269,795 | 0.36 | 9,347 | 12.4 |
| COADREAD | 1,298,901 | 0.23 | 387,318 | 55.9 |
| ESCASTAD | 1,293,959 | 0.14 | 411,441 | 15.1 |
| KC | 1,293,994 | 0.09 | 411,688 | 14.5 |
| LC | 1,293,976 | 0.15 | 412,835 | 35.7 |
| LL | 1,293,985 | 0.14 | 411,202 | 10.1 |
| MM | 1,293,929 | 0.08 | 417,127 | 18.3 |
| OCPC | 1,293,988 | 0.04 | 411,573 | 3.3 |
| OV | 1,227,160 | 0.0042 | 85,426 | 13.5 |
| PAAD | 518,381 | 0.06 | 7,785 | 7.9 |
| PRAD | 1,202,176 | 0.16 | 140,254 | 120.1 |
| THCA | 1,293,992 | 0.21 | 411,112 | 10.2 |
| UCEC | 1,280,529 | 0.03 | 121,885 | 29.4 |
: Heritability estimated.
Prev. (/100,000): Estimated number of prevalent cases in 2020 (proportions per 100,000).
BLCA: bladder cancer, BRCA: breast cancer, CESC: cervical squamous cell carcinoma and endocervical adenocarcinoma, COADREAD: colorectal cancer, ESCASTAD: esophageal or stomach adenocarcinoma, KC: kidney cancer, LC: lung cancer, LL: lymphocytic leukemia, MM: melanoma, OCPC: oral cavity and pharyngeal cancer, OV: ovarian cancer, PAAD: pancreatic adenocarcinoma, PRAD: prostate cancer, THCA: thyroid carcinoma, and UCEC: uterine corpus endometrial cancer.
Summary of the 15 potential causal genes.
| Gene | CHR | Start | End |
| Cancer | No. SNP |
|---|---|---|---|---|---|---|
|
| 5 | 1,201,710 | 1,225,232 | 1.39E-12 | MM, OV, LC, PAAD, UCEC | 13 |
|
| 5 | 1,225,470 | 1,246,304 | 5.32E-13 | MM, OV, LC, PAAD, UCEC | 23 |
|
| 5 | 1,253,262 | 1,295,184 | 5.32E-13 | MM, OV, LC, PAAD, UCEC | 23 |
|
| 5 | 1,317,859 | 1,345,214 | 5.32E-13 | MM, OV, LC, PAAD, UCEC | 23 |
|
| 5 | 1,392,905 | 1,445,545 | 5.32E-13 | MM, LC, PAAD | 21 |
|
| 6 | 31,367,561 | 31,383,092 | 2.30E-11 | CESC, PRAD, UCEC | 6 |
|
| 6 | 31,462,658 | 31,478,901 | 2.30E-11 | CESC, PRAD, UCEC | 7 |
|
| 6 | 31,496,494 | 31,498,009 | 2.30E-11 | CESC, PRAD, UCEC | 10 |
|
| 6 | 31,497,996 | 31,510,225 | 2.30E-11 | CESC, PRAD, UCEC | 12 |
|
| 6 | 31,497,996 | 31,514,385 | 2.30E-11 | CESC, PRAD, UCEC | 12 |
|
| 6 | 31,512,239 | 31,516,204 | 2.30E-11 | CESC, PRAD, UCEC | 12 |
|
| 6 | 31,514,647 | 31,526,606 | 2.30E-11 | CESC, PRAD, UCEC | 12 |
|
| 8 | 128,747,680 | 128,753,680 | 1.77E-09 | BLCA, PRAD, PAAD | 4 |
|
| 8 | 128,426,535 | 128,432,314 | 5.73E-186 | PRAD, BRCA, COADREAD | 11 |
|
| 11 | 69,061,605 | 69,182,494 | 3.84E-97 | PRAD, BRCA, KC | 11 |
FIGURE 2Summary of the CAPER genes. (A) Ideogram of the 15 CAPER genes (the color of each chromosome indicates gene density across the human genome). (B) The bubble plot shows the minimum p-value of each causal gene in each cancer dataset.
Summary of the univariable Cox regression analysis conducted on 12 CAPER genes in the TCGA training set *.
| Gene | Coef. | SE (coef.) |
|
| HR (95%CI) |
|---|---|---|---|---|---|
|
| −0.38 | 0.07 | −5.57 | 2.59E-8 | 0.68 (0.59, 0.78) |
|
| 0.00 | 0.03 | 0.06 | 0.96 | 1.00 (0.95, 1.06) |
|
| 0.03 | 0.04 | 0.87 | 0.39 | 1.03 (0.96, 1.11) |
|
| 0.06 | 0.03 | 2.01 | 0.045 | 1.06 (1.00, 1.12) |
|
| 0.09 | 0.02 | 4.77 | 1.87E-6 | 1.09 (1.05, 1.13) |
|
| 0.09 | 0.02 | 5.86 | 4.62E-9 | 1.09 (1.06, 0.78) |
|
| −0.07 | 0.04 | −1.81 | 0.070 | 0.93 (0.87, 1.01) |
|
| −0.36 | 0.08 | −4.57 | 4.81E-6 | 0.70 (0.60, 0.82) |
|
| −0.03 | 0.02 | −1.85 | 0.065 | 0.97 (0.94, 1.00) |
|
| 0.07 | 0.04 | 1.67 | 0.095 | 1.07 (0.99, 1.16) |
|
| 0.03 | 0.05 | 0.62 | 0.53 | 1.03 (0.93, 1.15) |
|
| −0.07 | 0.03 | −2.75 | 6.03E-3 | 0.93 (0.89, 0.98) |
The effect sizes of genes are adjusted by age, sex, and tumor stage.
FIGURE 3Summary of the CAPER score. (A) Heatmap showing the p-value of each CAPER gene in the univariate Cox regression performed on each type of cancer. (B) ROC of the multivariate Cox regression conducted on the TCGA test set. We estimated the AUC of 1-year (green), 3-year (Red), and 5-year (Blue) survival. (C) K-M curve of the high- and low-CAPER score groups. The HR of the CAPER score was statistically significant to the survival time with a p-value of <0.001 and was adjusted for age, sex, and tumor stage.
Summary of the multivariable Cox regression conducted on 12 CAPER genes in the TCGA training set *.
| Gene | Coef | SE (coef.) |
|
| HR (95%CI) |
|---|---|---|---|---|---|
|
| −0.28 | 0.08 | −3.44 | 5.79E-0 | 0.75 (0.64, 0.89) |
|
| 0.06 | 0.04 | 1.67 | 0.095 | 1.06 (0.99, 1.14) |
|
| 0.06 | 0.05 | 1.23 | 0.22 | 1.06 (0.97, 1.15) |
|
| −0.03 | 0.04 | −0.89 | 0.37 | 0.97 (0.90, 1.04) |
|
| 0.12 | 0.02 | 5.20 | 1.97E-7 | 1.13 (1.08, 1.18) |
|
| 0.08 | 0.02 | 4.14 | 3.45E-5 | 1.08 (1.04, 1.12) |
|
| −0.09 | 0.05 | −1.88 | 0.060 | 0.92 (0.84, 1.00) |
|
| −0.41 | 0.08 | −4.87 | 1.11E-6 | 0.66 (0.56, 0.78) |
|
| −0.02 | 0.02 | −0.72 | 0.47 | 0.98 (0.94, 1.03) |
|
| 0.06 | 0.05 | 1.32 | 0.19 | 1.07 (0.97, 1.17) |
|
| 0.02 | 0.07 | 0.23 | 0.82 | 1.02 (0.89, 1.16) |
|
| −0.06 | 0.03 | −1.80 | 0.071 | 0.94 (0.88, 1.01) |
*The effect sizes of genes are adjusted by age, sex, and tumor stage.
FIGURE 4Summary of the differences between the three tumor-related variables in the different CAPER groups using the TCGA test set. (A) immune subtypes; (B) the pathological stage of the primary tumor (T); (C) the pathological stage of the lymph nodes (N).
Summary of the association between clinical categorical variables and the CAPER score in TCGA.
| Variables | High-CAPER | Low-CAPER |
|
|
|---|---|---|---|---|
| The Primary Tumor | ||||
| Ti, T1 | 79 (16.5%) | 139 (29.0%) | 20.35 | 6.44E-6 |
| T2, T3, T4 | 399 (83.5%) | 341 (71.0%) | ||
| The Lymph Nodes | ||||
| N0 | 246 (53.7%) | 229 (59.6%) | 17.68 | 5.12E-4 |
| N1 | 103 (22.5%) | 104 (27.1%) | ||
| N2 | 86 (18.8%) | 34 (8.9%) | ||
| N3 | 23 (5.0%) | 17 (4.4%) | ||
| Metastasis | ||||
| M0 | 332 (93.0%) | 354 (94.4%) | 0.40 | 0.53 |
| M1 | 25 (7.0%) | 21 (5.6%) | ||
| Immune Subtypes | ||||
| C1 | 150 (37.2%) | 89 (21.0%) | 130.05 | <2.2E-16 |
| C2 | 169 (41.9%) | 82 (19.3%) | ||
| C3 | 66 (16.4%) | 201 (47.4%) | ||
| C4 | 18 (4.5%) | 52 (12.3%) | ||
FIGURE 5Summary of the correlation between TME and CAPER score or CAPER gene expression using the TCGA test set. (A) Differences in the immune score between the CAPER score groups. (B) Differences in the stromal score between the CAPER score groups. (C) Differences in tumor purity between the CAPER score groups. (D) The bubble plot shows the p-value of Spearman correlation tests conducted on 18 types of cancer. (E) The boxplot cellular abundance differential score between the high- and low-CAPER score groups (*p < 0.05, **p < 0.01, and ***p < 0.001). (F) The autocorrelation plot of cellular abundance.
Summary of the association between four tumor immunity-related variables and CAPER score in the IMvigor210 cohort.
| Variables | High-CAPER | Low-CAPER |
|
|
|---|---|---|---|---|
| IC Levels | ||||
| IC0 | 41 (25.8%) | 50 (31.6%) | 5.55 | 0.06 |
| IC1 | 56 (35.2%) | 66 (41.8%) | ||
| IC2+ | 62 (39.0%) | 42 (26.6%) | ||
| TC Levels | ||||
| TC0 | 113 (71.1%) | 137 (86.7%) | 11.96 | 2.53E-3 |
| TC1 | 14 (8.8%) | 8 (5.1%) | ||
| TC2+ | 32 (20.1%) | 13 (8.2%) | ||
| Immune Phenotypes | ||||
| Desert | 27 (21.6%) | 44 (33.3%) | 9.19 | 0.01 |
| Excluded | 55 (44.0%) | 63 (47.7%) | ||
| Inflamed | 43 (34.4%) | 25 (18.9%) | ||
| Lund Molecular Subtype | ||||
| UroA | 37 (27.8%) | 55 (53.4%) | 37.48 | 7.28E-9 |
| Infiltrated | 41 (30.8%) | 41 (39.8%) | ||
| Basal/SCC-like | 55 (41.4%) | 7 (6.8%) | ||
FIGURE 6Summary of the differences between the four tumor phenotypes in different CAPER groups using IMvigor210. (A) TC levels; (B) IC levels; (C) immune phenotypes; (D) Lund molecular subtypes.