| Literature DB >> 34930307 |
Abstract
BACKGROUND: The successful identification of breast cancer (BRCA) prognostic biomarkers is essential for the strategic interference of BRCA patients. Recently, various methods have been proposed for exploring a small prognostic gene set that can distinguish the high-risk group from the low-risk group.Entities:
Keywords: Biomarker; Breast cancer; Feature selection; Prognostic risk score; Regularized Cox proportional hazards model
Mesh:
Substances:
Year: 2021 PMID: 34930307 PMCID: PMC8686664 DOI: 10.1186/s12967-021-03180-y
Source DB: PubMed Journal: J Transl Med ISSN: 1479-5876 Impact factor: 5.531
The basic information and clinical characteristics of patients with BRCA patients
| Characteristics | Datasets | |||||||
|---|---|---|---|---|---|---|---|---|
| Discovery | External validation (totally 1848 patients) | |||||||
| Cohort | TCGA | GSE1456 | GSE2034 | GSE7390 | GSE17705 | GSE21653 | GSE35629 | |
| Platform | DCC | GPL96 | GPL96 | GPL96 | GPL570 | GPL570 | GPL1390 | |
| Survival | OS | OS, DMFS | DMFS | OS, DFS, TDM | RFS | DFS | OS, RFS | |
| # of samples | 1080 | 159, 159 | 286 | 198, 198, 198 | 298 | 248 | 53, 51 | |
| # of genes | 20220 | 13701 | 13701 | 13701 | 13701 | 21835 | 7800 | |
| Age | 587 | 195 | 157 | |||||
| 479 | 3 | 91 | ||||||
| NA | 14 | 0 | 0 | |||||
| Average | 58 | 46 | 55 | |||||
| IQR | (49, 67) | (42, 51) | (45, 66) | |||||
| Tumor size (mm) | 21.81 | |||||||
| Stage | I | 181 | 28 | 30 | 43 | |||
| II | 611 | 58 | 83 | 84 | ||||
| III | 246 | 61 | 83 | 121 | ||||
| IV | 20 | 0 | 0 | |||||
| V | 14 | 0 | 0 | |||||
| NA | 8 | 12 | 2 | 4 | ||||
| T stage | T1 | 279 | 57 | |||||
| T2 | 626 | 121 | ||||||
| T3 | 134 | 63 | ||||||
| T4 | 38 | 0 | ||||||
| NX | 3 | 7 | ||||||
| N stage | N0 | 505 | ||||||
| N1 | 359 | |||||||
| N2 | 120 | |||||||
| N3 | 76 | |||||||
| NX | 20 | |||||||
| M stage | M0 | 896 | ||||||
| M1 | 22 | |||||||
| MX | 162 | |||||||
| Status | 0 | 928 | 130, 119 | 276 | 142, 107, 147 | 227 | 169 | 29, 30 |
| 1 | 152 | 29, 40 | 10 | 56, 91, 51 | 71 | 79 | 24, 21 | |
| References | [ | [ | [ | [ | [ | [ | [ | |
*IQR: Interquartile range (1, 3). OS overall survival, DMFS distant metastasis-free survival, TDM time to distant metastasis, DFS disease-free survival, RFS relapse free survival. For OS: 1 = dead from BRCA, 0 = alive or censored. For DMFS: 1 = relapse, 0 = no relapse or censored. For TDM, DFS and RFS: 1 = event, 0 = censoring
The details of the datasets for prognostic prediction of PRS
| Datasets | Platforms | # of samples | # of genes | References |
|---|---|---|---|---|
| TCGA | DCC | 224 (112 Normal / 112 Tumor) | 20222 | [ |
| GSE5764 | GPL570 | 15 (10 Normal / 5 Tumor) | 21835 | [ |
| GSE7904 | GPL570 | 62 (19 Normal / 43 Tumor) | 16452 | [ |
| GSE10780 | GPL570 | 185 (143 Normal / 52 IDC) | 21835 | [ |
| GSE42568 | GPL570 | 121 (17 Normal / 104 IDC) | 21835 | [ |
*IDC Invasive ductal carcinomas. Histopathological BRCA subtypes: invasive ductal (IDC), invasive lobular (ILC), mixed ductal/lobular (Mixed), and other-type (Other) carcinoma [72]
Fig. 1The framework of detecting and verifying prognostic biomarkers of BRCA from gene expression data by the RCPH methods. a Download the publically available RNA-seq data and then select DEGs. b Add the prior knowledge from KEGG, GO, MammaPrint, OSbrca, and scPrognosis to DEGs and integrate all of them with RegNetwork to obtain a connected network component, and extract the gene expression values accordingly. c Apply the RCPH models on the network-structured data to select the feature genes of BRCA. d Choose the optimal feature subsets via the assessment of C-index and P-value. e Identify the genes with non-zero regression coefficients as the potential BRCA biomarkers. f Establish the PRS model based on statistically significant genes from the Cox model in order to make response predictions for prognosis and treatment of BRCA. g Perform survival analysis using the PRS index to investigate its prognostic performance. h Explore the significance and differences of PRS index in normal and tumor tissues
penalty functions of regularization term used in RCPH models
| Methods | Formulas | References |
|---|---|---|
| Ridge | [ | |
| Lasso | [ | |
| Enet | [ | |
| [ | ||
| [ | ||
| SCAD | [ | |
| where | ||
| MCP | [ | |
| where |
The results of feature selection on the discovery dataset performed by seven different RCPH models
| Methods | Training dataset ( | Testing dataset ( | |
|---|---|---|---|
| # of features | C-index ± Std. Dev | P-value ± Std. Dev | |
| Ridge-RCPH | 1142 | ||
| Lasso-RCPH | 47 | ||
| Enet-RCPH | 66 | ||
| 4 | |||
| 17 | |||
| SCAD-RCPH | 42 | ||
| MCP-RCPH | 22 | ||
*Std. Dev Standard deviation
Fig. 2The overlap of features among the five different RCPH methods (except for Ridge-RCPH method), where the top bar shows that the interactions among different methods described by bottom dotted lines
Fig. 3The network structure of 72 biomarkers, where the color shows the significance of gene expression difference between normal and disease, and the node size refers to its degree
Fig. 4SS-measure boxplots and significantly enriched functions in the 72 biomarker genes. a The SS-measure between our enriched GO terms and randomly selected GO terms by comparing with the unique GO terms in hallmarks of BRCA. b The gene network that closely related to the top 6 significant GO terms with SS-measure of our enriched functions
Fig. 5Pathway enrichment analysis of 72 biomarker genes, where the P-value is calculated by accumulative hypergeometric test. a The heatmap of top 20 statistically enriched terms. b The functional enrichment map of pathways, where the subsets of representative terms are selected from the cluster and converted into a network layout. More specifically, each circle node represents a term, where its size is proportional to the number of input genes that fall into that term, and its color represents its cluster identity (i.e., nodes of the same color belong to the same cluster)
Fig. 6The association between PRS index and overall survival status in BRCA patients on the discovery dataset. a The KM-curves of PRS. b The distributions of PRS and survival time for each sample
Fig. 7The nomogram for predicting the proportion and the calibration plots in terms of the agreement between predicted of BRCA patients. a The nomogram for predicting proportion of BRCA patients with 1-year, 3-year and 5-year, respectively. b–d The calibration plots for predicting 1-year, 3-year and 5-year
Fig. 8The comparison of PRS index in normal and tumor tissues in the internal TCGA dataset and four external GEO datasets. a TCGA (n = 224). b GSE7904 (n = 62). c GSE42568 (n = 121). d GSE5764 (n = 15). e GSE10780 (n = 185)
Fig. 9Protein expression profiles of the three genes (ADRB1, SAV1 and TSPAN14) in PRS between normal and tumor tissues. All subfigures are extracted from the Human Protein Atlas database
Fig. 10The KM-curves of high-risk and low-risk BRCA patients in some external datasets from the GEO database. a GSE1456 (OS). b GSE7390 (OS). c GSE35629 (OS). d GSE7390 (DFS). e GSE21653 (DFS). f GSE17705 (RFS). g GSE35629 (RFS). h GSE1456 (DMFS). i GSE1456 (DMFS). j GSE7390 (TDM)
The relationship between six DEGs and chemoresistance
| Gene symbol | Description | References |
|---|---|---|
| CEL | Carboxyl Ester Lipase | – |
| PGK1 | Phosphoglycerate Kinase 1 | [ |
| PTGES3 | Prostaglandin E Synthase 3 | – |
| RAPGEFL1 | Rap Guanine Nucleotide Exchange Factor Like 1 | [ |
| SERPINA1 | Serpin Family A Member 1 | [ |
| WWOX | SWW Domain Containing Oxidoreductase | [ |