| Literature DB >> 31701019 |
Jean-Sébastien Milanese1, Chabane Tibiche1, Jinfeng Zou1, Zhigang Meng2,3, Andre Nantel1, Simon Drouin1, Richard Marcotte1,4, Edwin Wang2,5.
Abstract
Germline variants such as BRCA1/2 play an important role in tumorigenesis and clinical outcomes of cancer patients. However, only a small fraction (i.e., 5-10%) of inherited variants has been associated with clinical outcomes (e.g., BRCA1/2, APC, TP53, PTEN and so on). The challenge remains in using these inherited germline variants to predict clinical outcomes of cancer patient population. In an attempt to solve this issue, we applied our recently developed algorithm, eTumorMetastasis, which constructs predictive models, on exome sequencing data to ER+ breast (n = 755) cancer patients. Gene signatures derived from the genes containing functionally germline variants significantly distinguished recurred and non-recurred patients in two ER+ breast cancer independent cohorts (n = 200 and 295, P = 1.4 × 10-3). Furthermore, we compared our results with the widely known Oncotype DX test (i.e., Oncotype DX breast cancer recurrence score) and outperformed prediction for both high- and low-risk groups. Finally, we found that recurred patients possessed a higher rate of germline variants. In addition, the inherited germline variants from these gene signatures were predominately enriched in T cell function, antigen presentation, and cytokine interactions, likely impairing the adaptive and innate immune response thus favoring a pro-tumorigenic environment. Hence, germline genomic information could be used for developing non-invasive genomic tests for predicting patients' outcomes in breast cancer.Entities:
Keywords: Cancer genetics; Cancer models; Computational biology and bioinformatics; Predictive markers
Year: 2019 PMID: 31701019 PMCID: PMC6825127 DOI: 10.1038/s41698-019-0100-7
Source DB: PubMed Journal: NPJ Precis Oncol ISSN: 2397-768X
Demographic and clinical characteristics for ER+ breast cancer samples
| Variable | Training set ( | Validation set 1, TCGA-CPTAC ( | Validation set 2, TCGA Nature ( | |||
|---|---|---|---|---|---|---|
| Clinical characteristic | Number of patients | Percentage | Number of patients | Percentage | Number of patients | Percentage |
| Age, years | ||||||
| Median | 59 | 60 | 58 | |||
| ≤59 | 102 | 51 | 149 | 50.5 | 105 | 52.5 |
| >59 | 98 | 49 | 146 | 49.5 | 95 | 47.5 |
| Death | ||||||
| Yes | 29 | 14.5 | 33 | 11.2 | 26 | 13 |
| No | 171 | 85.5 | 262 | 88.8 | 174 | 87 |
| Stage | ||||||
| I | 37 | 18.5 | 53 | 17.9 | 30 | 15 |
| II | 108 | 54 | 164 | 55.6 | 113 | 56.5 |
| III | 40 | 20 | 72 | 24.4 | 49 | 24.5 |
| IV | 8 | 4 | 2 | 0.7 | 4 | 2 |
| X | 5 | 2.5 | 2 | 0.7 | 3 | 1.5 |
| NA | 2 | 1 | 2 | 0.7 | 1 | 0.5 |
| Subtype | ||||||
| Luminal A | 95 | 47.5 | 38 | 12.9 | 58 | 29 |
| Luminal B | 42 | 21 | 18 | 6.1 | 46 | 23 |
| Unknown | 10 | 5 | 11 | 3.7 | 21 | 10.5 |
| NA | 53 | 26.5 | 228 | 77.3 | 75 | 37.5 |
| Nodal status | ||||||
| 0 | 87 | 43.5 | 129 | 43.7 | 87 | 43.5 |
| 1–2 | 102 | 51 | 130 | 44.1 | 92 | 46 |
| 3 | 7 | 3.5 | 30 | 10.2 | 18 | 9 |
| X | 4 | 2 | 6 | 2 | 3 | 1.5 |
| Relapse | ||||||
| Yes | 30 | 15 | 34 | 11.5 | 20 | 10 |
| No | 170 | 85 | 261 | 88.5 | 180 | 90 |
| DFS, months | ||||||
| Median | 49.3 | 32 | 34.2 | |||
| ≤38.5 | 97 | 48.5 | 197 | 66.8 | 126 | 63 |
| >38.5 | 86 | 43 | 73 | 24.7 | 56 | 28 |
| NA | 17 | 8.5 | 25 | 8.5 | 18 | 9 |
Fig. 1A flowchart of eTumorMetastasis. a Germline variants were identified using whole-exome sequencing data of tumors and their paired normal samples. Functional annotation of all variants was performed and non-functional variants were filtered. b In parallel, a cancer-specific recurrence network was constructed. c Then we used network propagation (or heat diffusion) using the functionally mutated genes as seeds. Seeds act as heating sources and their heat is diffused across the network. Finally, when diffusion is complete, a “heating score” is assigned to each gene. d The “heating scores” for all network genes from all samples were then aggregated into a matrix from which we extract NOG signatures
Fig. 2Kaplan–Meier curves of the risk groups for breast cancer patients predicted by the NOG_CSS sets. Samples without DFS time or who could not be predicted were removed. NOG_CSS sets derived from germline mutations in a the training set, b the validation set, TCGA-Nature, and c the validation set, TCGA-CPTAC. Blue and red curves represent low- and high-risk groups, respectively. P values were obtained from two-sided χ2 test
Prediction accuracy and recall rate for validation sets for breast cancer using the NOG_CSS sets derived from germline mutations
| Dataset | Number of samples | Low risk | High risk | ||
|---|---|---|---|---|---|
| Accuracy (%)a | Recall (%)b | Accuracy (%)c | Recall (%)d | ||
| Training set | 200 | 93.8 | 26.5 | 27.5 | 36.7 |
| TCGA-Nature | 200 | 94.9 | 31.1 | 8.2 | 25.0 |
| TCGA-CPTAC | 295 | 93.5 | 38.7 | 16.6 | 20.6 |
aPercentage of non-recurred (i.e., non-metastatic) samples in the predicted low-risk group
bPercentage of the predicted low-risk samples from the non-recurred group
cPercentage of recurred (i.e., metastatic) samples in the predicted high-risk group
dPercentage of the predicted high-risk samples from the recurred group
Prediction accuracy and recall rate for validation samples for breast cancer using the NOG_CSS sets derived from gene expression of normal tissue
| Dataset | Number of samples | Low risk | High risk | ||
|---|---|---|---|---|---|
| Accuracy (%)a | Recall (%)b | Accuracy (%)c | Recall (%)d | ||
| TCGA-Validation | 49 | 88.9 | 48.5 | 66.7 | 62.5 |
aPercentage of non-recurred (i.e., non-metastatic) samples in the predicted low-risk group
bPercentage of the predicted low-risk samples from the non-recurred group
cPercentage of recurred (i.e., metastatic) samples in the predicted high-risk group
dPercentage of the predicted high-risk samples from the recurred group
Prediction accuracy and recall rate for breast cancer using Oncotype DX formula and RNA-seq data
| Dataset | Number of samples | Low risk | High risk | ||
|---|---|---|---|---|---|
| Precision (%)a | Recall (%)b | Precision (%)c | Recall (%)d | ||
| Training Set | 200 | 84.8 | 16.6 | 18.8 | 40.0 |
| TCGA-Nature | 200 | 90.0 | 20.0 | 6.5 | 20.0 |
| TCGA-CPTAC | 295 | 86.0 | 16.6 | 10.1 | 26.5 |
aPercentage of non-recurred (i.e., non-metastatic) samples in the predicted low-risk group
bPercentage of the predicted low-risk samples from the non-recurred group
cPercentage of recurred (i.e., metastatic) samples in the predicted high-risk group
dPercentage of the predicted high-risk samples from the recurred group
Fig. 3Boxplot comparison of functional germline variants and genes for the predicted risk groups. Samples who could not be predicted were removed. a Functional germline variants. b Functionally mutated genes. c Functional germline mutated immune genes. P values were obtained from two-sided Student’s t test. P value significance: ****<0.0001. Outliers are shown as individual points
Fig. 4Boxplot comparison of leukocyte expression profiles for the predicted risk groups. Samples who could not be predicted were removed. For a complete analysis, see Fig. S1. P values were obtained from two-sided Student’s t test. P value significance: *<0.05, **<0.01. Outliers are shown as individual points
Fig. 5Boxplot comparison of leukocyte cell fractions for the predicted risk groups. Samples who could not be predicted were removed. For a complete analysis, see Fig. S2. P values were obtained from two-sided Student’s t test. P value significance: *<0.05, **<0.01. Outliers are shown as individual points