| Literature DB >> 29373522 |
Silu Zhang1, Junqing Wang2, Torumoy Ghoshal3, Dawn Wilkins4, Yin-Yuan Mo5, Yixin Chen6, Yunyun Zhou7.
Abstract
Background: Breast cancer is intrinsically heterogeneous and is commonly classified into four main subtypes associated with distinct biological features and clinical outcomes. However, currently available data resources and methods are limited in identifying molecular subtyping on protein-coding genes, and little is known about the roles of long non-coding RNAs (lncRNAs), which occupies 98% of the whole genome. lncRNAs may also play important roles in subgrouping cancer patients and are associated with clinical phenotypes.Entities:
Keywords: breast cancer; feature selection; intrinsic subtypes; lncRNA
Year: 2018 PMID: 29373522 PMCID: PMC5852561 DOI: 10.3390/genes9020065
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1An example of a gene selection curve for the selection from all coding genes.
Subtype prediction accuracy for integrative gene signatures with PAM50.
| Gene Type | # Of Genes before Selection | # Of Selected Genes from RNAseq | # Of PAM50 Genes from Microarray | Integrative Classification Accuracy (%) 1 |
|---|---|---|---|---|
| PCGs | 19797 | 100 | 22 | 95.5: [95.1, 95.9] |
| lncRNAs | 40701 | 85 | 21 | 95.3: [94.8, 95.8] |
| all | 60498 | 106 | 19 | 95.8: [95.4, 96.2] |
1 Classification accuracy was measured by performing 10-fold cross-validation.
Evaluation of the prediction accuracies from TCGA RNAseq (n = 839) selected genes by two-iteration selection.
| Gene Type | # Of Genes Selected in Iteration 1 | # Of Genes Selected in Iteration 2 | Classification Accuracy (%) 1 |
|---|---|---|---|
| coding | 417 | 50 | 87.6: [87.2, 88.0] |
| non-coding | 466 | 29 | 87.8: [87.6, 88.0] |
| all | 530 | 36 | 88.5: [88.1, 88.9] |
1 Classification accuracy was measured by performing 10-fold cross-validation 10 times.
Figure 2Visualization of breast cancer subtypes using selected 29 non-coding (a) and 36 “all” (b) gene features for 839 TCGA RNAseq training set; the sensitivity and specificity of prediction accuracy based on ROC curve for 29 non-coding (c) and 36 “all” (d) gene features.
Evaluation of prognostic performance for risk scores from various gene signatures.
| Gene Type | # Of Genes | Overall Survival | Recurrence-Free Survival | ||
|---|---|---|---|---|---|
| coding | 50 | 0.031 | 0.000000485 | 0.018 | 0.000588 |
| non-coding | 29 | 0.023 | 0.0000104 | 0.017 | 0.000938 |
| all | 36 | 0.023 | 0.00000852 | 0.025 | 0.0000483 |
Figure 3Kaplan–Meier curves and log-rank test p-values for overall survival (a) and recurrence-free survival (b) for the 36 “all” gene signature from 839 RNAseq data.
Multivariate Cox regression survival analysis.
| Hazards Ratio (95% Confidence Interval) | ||
|---|---|---|
| 2.3: [1.2, 4.5] | 0.01 * | |
| 1.0: [1.0, 1.0] | 0.0041 * | |
| White | Reference | |
| Black | 1.5: [0.7, 3.0] | 0.31 |
| Asian | 0.5: [0.1, 3.9] | 0.51 |
| Untreated or other | Reference | |
| Chemotherapy | 0.6: [0.3, 1.3] | 0.20 |
| Radiation therapy | 0.5: [0.2, 1.0] | 0.06 |
| Hormone therapy | 0.4: [0.1, 1.2] | 0.09 |
| Radiation & chemotherapy | 0.3: [0.1, 0.6] | 0.0027 * |
| Radiation & hormone | 0.3: [0.1, 0.8] | 0.02 * |
| T1 | Reference | |
| T2 | 1.4: [0.7, 2.8] | 0.33 |
| T3 | 1.4: [0.6, 3.6] | 0.45 |
| T4 | 1.7: [0.6, 5.6] | 0.39 |
| Infiltrating Ductal Carcinoma | Reference | |
| Infiltrating Lobular Carcinoma | 1.2: [0.5, 3.2] | 0.70 |
| Mucinous Carcinoma | 2.6: [0.3, 22.3] | 0.83 |
| Mixed Histology | 0.9: [0.3, 2.9] | 0.37 |
* Statistical significance (p-value < 0.05).
Important prognostic biomarkers and univariate overall survival analysis.
| Gene_Name | Gene_Type | Chrom (Start–End Position) | HR (95% CI) | |
|---|---|---|---|---|
| DDX51 | PCGs | chr12:132136594-132144335 | 0.90: [0.84, 0.98] | 0.009 * |
| SPAG17 | PCGs | chr1: 117953861-118185223 | 0.94: [0.89, 0.99] | 0.027 * |
| NUMA1 | PCGs | chr11: 72002864-72080693 | 0.91: [0.87, 0.96] | 0.0003.5 * |
| CTD-2616J11.9 | lncRNAs | chr19: 51345169-51353293 | 0.91: [0.87, 0.96] | 0.001 * |
| RP1-140K8.1 | lncRNAs | chr6: 3893126-3894292 | 1.06: [1.00, 1.13] | 0.033 * |
| RP11-546K22.1 | lncRNAs | chr8: 51961458-52022974 | 0.94: [0.89, 0.99] | 0.043 * |
| AC000095.9 | lncRNAs | chr22: 19018043-19018916 | 0.91: [0.85, 0.98] | 0.011 * |
| SCGB1D5P | lncRNAs | chr4: 165517255-165517501 | 1.03: [1.00, 1.08] | 0.067 |
* Statistical significance (p-value < 0.05).
Evaluation of selected genes by breast cancer subtype classification.
| Predicted Subtypes | ER-/PR-/HER2- | ER-/PR-/HER2+ | ER+/PR+/HER2 |
|---|---|---|---|
| Basal | 23 (92%) | 0 | 2 (2.1%) |
| Her2 | 1 (4%) | 4 (80%) | 2 (2.1%) |
| Luminal A | 1 (4%) | 1 (20%) | 76 (80.9%) |
| Luminal B | 0 | 0 | 14 (14.9%) |