| Literature DB >> 33790307 |
Yuan-Kuei Li1,2, Huan-Ming Hsu3,4,5, Meng-Chiung Lin6, Chi-Wen Chang7,8,9, Chi-Ming Chu10,11,12,13,14, Yu-Jia Chang15,16, Jyh-Cherng Yu3, Chien-Ting Chen10, Chen-En Jian10, Chien-An Sun11, Kang-Hua Chen7,9, Ming-Hao Kuo17, Chia-Shiang Cheng18, Ya-Ting Chang10, Yi-Syuan Wu18, Hao-Yi Wu10, Ya-Ting Yang10, Chen Lin2,19, Hung-Che Lin5,17,20, Je-Ming Hu5,17,21,22, Yu-Tien Chang23,24.
Abstract
Genetic co-expression network (GCN) analysis augments the understanding of breast cancer (BC). We aimed to propose GCN-based modeling for BC relapse-free survival (RFS) prediction and to discover novel biomarkers. We used GCN and Cox proportional hazard regression to create various prediction models using mRNA microarray of 920 tumors and conduct external validation using independent data of 1056 tumors. GCNs of 34 identified candidate genes were plotted in various sizes. Compared to the reference model, the genetic predictors selected from bigger GCNs composed better prediction models. The prediction accuracy and AUC of 3 ~ 15-year RFS are 71.0-81.4% and 74.6-78% respectively (rfm, ACC 63.2-65.5%, AUC 61.9-74.9%). The hazard ratios of risk scores of developing relapse ranged from 1.89 ~ 3.32 (p < 10-8) over all models under the control of the node status. External validation showed the consistent finding. We found top 12 co-expressed genes are relative new or novel biomarkers that have not been explored in BC prognosis or other cancers until this decade. GCN-based modeling creates better prediction models and facilitates novel genes exploration on BC prognosis.Entities:
Year: 2021 PMID: 33790307 PMCID: PMC8012617 DOI: 10.1038/s41598-021-84995-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
The clinical characteristics of BC patients of validation data sets.
| RFS_event | OR | p | ||||
|---|---|---|---|---|---|---|
| No (n = 662) | Yes (n = 394) | |||||
| Mean | sd | Mean | sd | |||
| Follow-up time (year) | 8.86 | 3.19 | 3.23 | 2.85 | 0.559 | < 0.001 |
| Age | 59 | 13 | 56 | 13 | 0.984 | 0.007 |
| Tumor size | 2.17 | 1.13 | 2.46 | 1.08 | 1.267 | 0.001 |
| n | % | |||||
| Negative | 160 | 20.6% | 97 | 21.9% | 0.925 | 0.591 |
| Positive | 617 | 79.4% | 346 | 78.1% | ||
| Negative | 629 | 87.0% | 340 | 78.5% | 1.830 | |
| Positive | 94 | 13.0% | 93 | 21.5% | < 0.001 | |
| 1 | 133 | 26.0% | 36 | 11.1% | ||
| 2 | 221 | 43.2% | 173 | 53.2% | 2.892 | < 0.001 |
| 3 | 158 | 30.9% | 116 | 35.7% | 2.712 | < 0.001 |
RFS relapse-free survival, ER estrogen receptor, sd standard deviation. Odds ratio (OR) and p values are analyzed using univariable logistic regression.
The number of genes in the GCNs.
| r | Recurrence | No recurrence | Total number of unique genes |
|---|---|---|---|
| Ref | 34 | 34 | 34 |
| > 0.82 | 79 | 131 | 137 |
| > 0.80 | 110 | 216 | 221 |
| > 0.79 | 133 | 310 | 443 |
Characteristics of GCN-based models.
| Model 1 | Model 2 | Model 3 | Model 4 | Model 5a | |
|---|---|---|---|---|---|
| Total # of genetic factors | 34 | 137 | 221 | 443 | 46 |
| # of genes in the model | 6 | 13 | 17 | 34 | 8 |
| ACC | 63.2–65.5 | 66.5–70.3 | 66.6–74.4 | 78.2–82.5 | 66.5–74.4 |
| AUC | 61.9–64.2 | 64.2–75 | 65.2–75.1 | 68.7–77.1 | 64.1–74.8 |
| R square | 0.05 | 0.09 | 0.12 | 0.21 | 0.08 |
| goodness of fit | < 0.001 | 1.21E−11 | 2.22E−16 | ~ 0 | 2.77E−12 |
ACC accuracy, AUC area under the curve.
aModel created using the approach of SNM.
Figure 1The ACCs (bar chart) and AUCs (histogram) of Model 1–5 and Model from Chou’s study[28] for predicting the 3, 5, 10 and 15-year RFS in BC.
Figure 2Partial cox regression plots of Models 1–5 and Model from Chou’s study[28].
Figure 3Cox regression of the risk scores of each model on predicting the risk of relapse for breast cancer patients. Red bars are the hazard ratio (HR) of categorical risk scores of each model; blue bars are the hazard ratio (HR) of continuous risk scores of each model.
The AUC of model validation using independent data sets.
| n-year RFS | Model 1 | Model 2 | Model 3 | Model 4 | Model 5 |
|---|---|---|---|---|---|
| 3 | 69.2 [65.3;73.1] | 69.6 [65.6:73.6] | 70.6 [66.6;74.6] | 74.8 [71.1;78.6] | 69.7 [65.8;73.6] |
| 5 | 68 [64.5;71.6] | 68.7 [65.0;72.3] | 69.5 [65.9;73.1] | 74.6 [71.3;78.0] | 68.1 [64.5;71.6] |
| 10 | 66.8 [62.7;70.9] | 67.0 [62.9;71.1] | 67.4 [63.4;71.5] | 72.1 [68.2;75.9] | 66.5 [62.3;70.6] |
Models are under the control of node status.
Figure 4The predictive pathways of top 12 important co-expressed genes.
Figure 5Workflow of the study.