| Literature DB >> 24563630 |
Ashish Saini1, Jingyu Hou1, Wanlei Zhou1.
Abstract
BACKGROUND: Breast cancer is the most common type of cancer among females with a high mortality rate. It is essential to classify the estrogen receptor based breast cancer subtypes into correct subclasses, so that the right treatments can be applied to lower the mortality rate. Using gene signatures derived from gene interaction networks to classify breast cancers has proven to be more reproducible and can achieve higher classification performance. However, the interactions in the gene interaction network usually contain many false-positive interactions that do not have any biological meanings. Therefore, it is a challenge to incorporate the reliability assessment of interactions when deriving gene signatures from gene interaction networks. How to effectively extract gene signatures from available resources is critical to the success of cancer classification.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24563630 PMCID: PMC3916021 DOI: 10.1155/2014/362141
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Microarray datasets used in this study.
| *Desmedt et al. [ |
van de Vijver et al. [ | *Loi et al. [ | *Sabatier et al. [ | *Schmidt et al. [ | |
|---|---|---|---|---|---|
| Platform | HG-U133A | Agilent human genome | HG-U133A, HG-U133B | HG-U133Plus2.0 | HG-U133A |
| Samples | 198 | 295 | 327 | 255 | 200 |
| ER | |||||
| ER+ (no. of samples) | 134 | 226 | 263 | 150 | 156 |
| ER− (no. of samples) | 64 | 69 | 45 | 102 | 44 |
| Tumour grade | |||||
| Grade 1 (no. of samples) | 30 | — | 52 | 44 | 29 |
| Grade 2 (no. of samples) | 83 | — | 158 | 88 | 136 |
| Grade 3 (no. of samples) | 83 | — | 57 | 116 | 35 |
| Metastasis Free Survival | |||||
| Yes (no. of samples) | 62 | 101 | 70 | 81 | 46 |
| No (no. of samples) | 136 | 194 | 224 | 160 | 154 |
| Age (in Years) | |||||
| ≤40 (no. of samples) | 42 | — | 19 | 49 | — |
| 41–70 (no. of samples) | 156 | — | 241 | 171 | — |
| >70 (no. of samples) | 0 | — | 55 | 34 | — |
| Average (in Years) | 46 | — | 59 | 54 | — |
|
| |||||
| Total Samples | 1253 ( 929 (ER+) and 324 (ER | ||||
|
| |||||
| Total samples selected in our Study (on the basis of histologic grade and receptor status) | 958 (703 (ER+) and 255 (ER−)) | ||||
Patients with missing histologic grade and estrogen receptor status based information are excluded from the training sets. *The datasets used in our training sets; **The testing sets.
Figure 1The eight subnetworks for any training set d. In each subnetwork, the symbol “⨂” shows the hub-gene/s, which has the highest number of interactions among other genes. In subnetwork S 6, two hub-genes are identified, as they both have the maximal and equal number of interactions; that is, each gene has 2 interactions.
Algorithm 1Pseudocode for the RRHGE algorithm.
Regression coefficients of µ (β 1) and σ (β 2) in each of the six training sets, respectively.
| Training set |
|
|
|
|
|---|---|---|---|---|
| ER+ (Grade 1) | −8.06 | 0.4601 | 0.7066 | <0.001 |
| ER+ (Grade 2) | 3.64 | 0.4878 | 0.6846 | <0.001 |
| ER+ (Grade 3) | 9.81 | 0.4650 | 0.7094 | <0.001 |
| ER− (Grade 1) | 1.49 | 0.4273 | 0.7274 | <0.001 |
| ER− (Grade 2) | −2.68 | 0.4484 | 0.7199 | <0.001 |
| ER− (Grade 3) | 1.20 | 0.4673 | 0.7078 | <0.001 |
*Here, β 0 in each training set represents very small value and so assigned β 0 as zero.
RRHGE gene signature size.
| Training | ER+ | ER− |
|---|---|---|
| Subnetwork list | Subnetwork list | |
| Grade 1 | 45 | 31 |
| Grade 2 | 35 | 37 |
| Grade 3 | 39 | 34 |
| Final gene |
|
|
Our gene signature set consists of 471 genes that compose 326 genes for the ER+ subtype and 145 genes for the ER− subtype.
Figure 2The proposed algorithm workflow. In our study, six training sets were used to generate the robust RRHGE gene signature, and two testing sets were used to classify the ER+/ER− breast cancer samples. The RRHGE gene signature set consists of 471 genes (326 for ER+ and 145 for ER− subtype).
Classification results of the RRHGE gene signature and other existing gene signatures on two testing sets, for example, (A) the Desmedt dataset and (B) the van de Vijver dataset.
| Algorithm |
| TP | FN | TN | FP | SN | SP | ACC | MCC | |
|---|---|---|---|---|---|---|---|---|---|---|
| (A) Desmedt | GGI | 190 | 84 | 45 | 29 | 32 | 0.651 | 0.475 | 0.595 | 0.121 |
| 70 g | 190 | 53 | 76 | 27 | 34 | 0.411 | 0.443 | 0.421 | −0.137 | |
| 76 g | 190 | 78 | 51 | 23 | 38 | 0.605 | 0.377 | 0.532 | −0.018 | |
| ITI | 190 | 95 | 34 | 33 | 28 | 0.736 | 0.541 | 0.674 | 0.271 | |
| HRGE | 190 | 115 | 14 | 36 | 25 | 0.891 | 0.590 | 0.795 | 0.511 | |
| RRHGE-H | 190 | 103 | 26 | 46 | 15 | 0.798 | 0.754 | 0.784 | 0.532 | |
| RRHGE-HI | 190 | 100 | 29 | 48 | 13 | 0.775 | 0.787 | 0.779 | 0.535 | |
| RRHGE-TSN | 190 | 119 | 10 | 54 | 7 | 0.922 | 0.885 | 0.911 | 0.798 | |
| RRHGE |
|
|
|
|
|
|
|
|
| |
|
| ||||||||||
| (B) | GGI | 150 | 77 | 37 | 17 | 19 | 0.675 | 0.472 | 0.627 | 0.131 |
| 70 g | 150 | 71 | 43 | 19 | 17 | 0.623 | 0.528 | 0.600 | 0.131 | |
| 76 g | 150 | 72 | 42 | 20 | 16 | 0.632 | 0.556 | 0.613 | 0.162 | |
| ITI | 150 | 59 | 55 | 19 | 17 | 0.518 | 0.528 | 0.520 | 0.039 | |
| HRGE | 150 | 70 | 44 | 20 | 16 | 0.614 | 0.556 | 0.600 | 0.146 | |
| RRHGE-H | 146 | 92 | 22 | 14 | 18 | 0.807 | 0.438 | 0.726 | 0.235 | |
| RRHGE-HI | 150 | 94 | 20 | 22 | 14 | 0.825 | 0.611 | 0.773 | 0.414 | |
| RRHGE-TSN | 150 | 101 | 13 | 26 | 10 | 0.886 | 0.722 | 0.847 | 0.592 | |
| RRHGE |
|
|
|
|
|
|
|
|
| |
Here, N defines the total number of samples, TP defines true positive (ER+ samples predicted as ER+), TN defines true negative (ER− samples predicted as ER−), FP defines false positive (ER− samples predicted as ER+), FN defines false negative (ER+ samples predicted as ER−), SE defines sensitivity, SP defines specificity, ACC defines accuracy, and MCC defines Matthews coefficient correlation. For simplicity, we represent the Genomic Grade Index as GGI, 70 gene signature as 70 g, 76 gene signature as 76 g, Interactome-Transcriptome Integration as ITI, and Hub-based Reliable Gene Expression as HRGE.. The RRHGE subnetwork based gene signature provides superior performance in both (A) Desmedt and (B) van de Vijver dataset.
Figure 3Bar charts represent the MCCs of various classification algorithms on Desmedt and van de Vijver datasets, respectively.
Figure 4Kaplan-Meier survival graphs for ER+ and ER− patient groups in the Desmedt dataset, using the RRHGE gene signature (similar results achieved for van de Vijver dataset (data not shown)). A log-rank test was performed to evaluate the P value, which signifies that the lower the P value is, the better the separation between the two prognosis groups is. (a) Incorporating the DMFS rate to distinguish between ER+ or good prognosis groups (lower risk of distant metastasis) and ER− or poor prognosis groups (higher risk of distant metastasis). (b) Incorporating the OS rate that distinguishes ER+ or good prognosis groups (lower risk of death) and ER− or poor prognosis groups (higher risk of death). Both survival analysis graphs show good separation between the two prognosis groups, respectively.
Number of overlapped genes of the RRHGE gene signature with ITI, 76 g, 70 g, and IGS.
|
| |
|---|---|
| ITI | 175 (37.16%) |
| 76 g | 03 (00.64%) |
| 70 g | 05 (01.06%) |
| IGS | 10 (02.12%) |
The ITI gene signature shows the highest number of overlapping genes with the RRHGE gene signature, as compared to other gene signatures.