| Literature DB >> 24490149 |
Chi-Cheng Huang1, Shih-Hsin Tu2, Ching-Shui Huang2, Heng-Hui Lien3, Liang-Chuan Lai4, Eric Y Chuang5.
Abstract
Multiclass prediction remains an obstacle for high-throughput data analysis such as microarray gene expression profiles. Despite recent advancements in machine learning and bioinformatics, most classification tools were limited to the applications of binary responses. Our aim was to apply partial least square (PLS) regression for breast cancer intrinsic taxonomy, of which five distinct molecular subtypes were identified. The PAM50 signature genes were used as predictive variables in PLS analysis, and the latent gene component scores were used in binary logistic regression for each molecular subtype. The 139 prototypical arrays for PAM50 development were used as training dataset, and three independent microarray studies with Han Chinese origin were used for independent validation (n = 535). The agreement between PAM50 centroid-based single sample prediction (SSP) and PLS-regression was excellent (weighted Kappa: 0.988) within the training samples, but deteriorated substantially in independent samples, which could attribute to much more unclassified samples by PLS-regression. If these unclassified samples were removed, the agreement between PAM50 SSP and PLS-regression improved enormously (weighted Kappa: 0.829 as opposed to 0.541 when unclassified samples were analyzed). Our study ascertained the feasibility of PLS-regression in multi-class prediction, and distinct clinical presentations and prognostic discrepancies were observed across breast cancer molecular subtypes.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24490149 PMCID: PMC3893734 DOI: 10.1155/2013/248648
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Performance of PLS-regression classifiers for prototypical arrays.
| Intrinsic subtype | Basal-like | HER2-enriched | Luminal-A | Luminal-B | Normal breast-like |
|
| |||||
| Number of samples | 57 | 35 | 23 | 12 | 12 |
| PLS-regression | |||||
| Number of gene component | 1 | 1 | 2 | 1 | 2 |
|
| 57.0% | 37.1% | 74.5% | 25.8% | 60.2% |
|
| 86.7% | 56.2% | 64.6% | 24.6% | 66.5% |
| Binary LR | |||||
| Adjusted | 0.99 | 0.73 | 0.9 | 0.63 | 0.99 |
| AUC | 1 | 0.96 | 0.99 | 0.96 | 1 |
| Accuracy | 98.6% | 89.9% | 97.1% | 95.0% | 100.0% |
| Sensitivity | 98.2% | 74.3% | 91.3% | 50.0% | 100.0% |
| Specificity | 98.8% | 95.2% | 98.3% | 99.2% | 100.0% |
PLS: partial least square, LR: logistic regression, AUC: area under the curve.
PAM50 prototypes and predicted subtypes by PLS-regression for prototypical arrays.
| PAM50 prototype | Predicted subtype | |||||
|---|---|---|---|---|---|---|
| Basal-like | HER2-enriched | Luminal-A | Luminal-B | Normal breast-like | Unclassified | |
| Basal-like (57) | 57 | 0 | 0 | 0 | 0 | 0 |
| HER2-enriched (35) | 0 | 26 | 0 | 0 | 0 | 9 |
| Luminal-A (23) | 0 | 0 | 21 | 0 | 0 | 2 |
| Luminal-B (12) | 0 | 1 | 0 | 6 | 0 | 5 |
| Normal breast-like (12) | 0 | 0 | 0 | 0 | 12 | 0 |
Performance of PLS-regression classifiers for independent validation dataset.
| Intrinsic subtype | Basal-like | HER2-enriched | Luminal-A | Luminal-B | Normal breast-like |
|
| |||||
| Number of samples | 97 | 94 | 165 | 121 | 56 |
| PLS-regression | |||||
| Number of gene component | 2 | 1 | 2 | 2 | 1 |
|
| 71.1% | 25.5% | 79.9% | 61.7% | 38.1% |
|
| 56.9% | 41.6% | 34.5% | 30.9% | 18.1% |
| Binary LR | |||||
| Adjusted | 0.86 | 0.66 | 0.73 | 0.61 | 0.39 |
| AUC | 0.98 | 0.95 | 0.95 | 0.93 | 0.89 |
| Accuracy | 96.6% | 90.7% | 88.2% | 86.2% | 90.5% |
| Sensitivity | 85.6% | 68.1% | 81.8% | 63.6% | 23.2% |
| Specificity | 99.1% | 95.5% | 91.1% | 92.8% | 98.3% |
PLS: partial least square, LR: logistic regression, AUC: area under the curve.
Single sample prediction by PAM50 centroids and predicted subtypes by PLS-regression for independent validation dataset.
| PAM50 SSP | Predicted subtype | |||||
|---|---|---|---|---|---|---|
| Basal-like | HER2-enriched | Luminal-A | Luminal-B | Normal | Unclassified | |
| Basal-like (97) | 83 | 1 | 0 | 1 | 0 | 12 |
| HER2-enriched (94) | 0 | 63 | 0 | 3 | 0 | 28 |
| Luminal-A (165) | 0 | 3 | 130 | 8 | 0 | 24 |
| Luminal-B (121) | 0 | 5 | 10 | 73 | 0 | 33 |
| Normal breast-like (56) | 1 | 3 | 17 | 1 | 8 | 26 |
| Unclassified (2) | 0 | 0 | 0 | 0 | 0 | 2 |
Association of clinical ER and HER2 status with intrinsic taxonomy, classified by either PAM50 single sample prediction or PLS-regression.
| Basal | HER2 | LumA | LumB | Norm | |
|---|---|---|---|---|---|
| ER | PAM50 SSP | ||||
| Negative | 40 | 28 | 0 | 3 | 9 |
| Positive | 0 | 7 | 67 | 46 | 7 |
| HER2 | |||||
| Normal | 37 | 5 | 63 | 30 | 8 |
| Over-expression | 3 | 30 | 4 | 19 | 8 |
|
| |||||
| ER | PLS-regression | ||||
| Negative | 38 | 19 | 1 | 1 | 3 |
| Positive | 0 | 5 | 59 | 33 | 2 |
| HER2 | |||||
| Normal | 35 | 0 | 56 | 21 | 4 |
| Over-expression | 3 | 24 | 4 | 13 | 1 |
SSP: single sample prediction, Basal: basal-like, HER2: Her2-enriched, LumA: luminal-A, LumB: luminal-B, Norm: normal breast-like subtype.
Figure 1Breast cancer disease-free survival stratified by intrinsic subtypes, classified by either PAM50 single sample prediction (a) or PLS-regression (b). dfs, disease-free survival; Basal, basal-like; HER2, Her2-enriched; LumA, luminal-A; LumB, luminal-B; Norm, normal breast-like subtype.
Compositions and weight vectors of five PLS-regressions for each molecular subtype.
| Basal-like | HER2-enriched | Luminal-A | Luminal-B | Normal breast-like | |||||
|---|---|---|---|---|---|---|---|---|---|
| ANLN | 0.271 | ACTR3B | −0.316 | BIRC5 | −0.299 | BCL2 | −0.325 | CCNB1 | −0.272 |
| CEP55 | 0.271 | BAG1 | −0.083 | CDCA1 | −0.294 | CDH3 | −0.667 | CDC6 | −0.241 |
| ESR1 | −0.319 | BLVRA | 0.317 | CENPF | −0.288 | CXXC5 | 0.484 | KRT14 | 0.350 |
| FOXA1 | −0.417 | CCNE1 | −0.067 | EXO1 | −0.293 | EGFR | −0.316 | KRT17 | 0.241 |
| FOXC1 | 0.370 | CDC20 | −0.069 | MAPT | 0.352 | KIF2C | −0.050 | KRT5 | 0.276 |
| GPR160 | −0.297 | ERBB2 | 0.452 | MYBL2 | −0.328 | MDM2 | 0.027 | MLPH | 0.376 |
| KNTC2 | 0.303 | FGFR4 | 0.365 | NAT1 | 0.421 | MKI67 | −0.136 | MMP11 | −0.404 |
| MELK | 0.270 | GRB7 | 0.470 | PTTG1 | −0.299 | ORC6L | −0.049 | RRM2 | −0.359 |
| MIA | 0.296 | MYC | −0.390 | SLC39A6 | 0.339 | PR | −0.143 | TYMS | −0.286 |
| TMEM45B | −0.323 | SFRP1 | −0.343 | UBE2C | −0.296 | PHGDH | −0.529 | UBE2T | −0.374 |
Figure 2Predicted probabilities and 95% confidence interval as a function of the 1st PLS score or 1st/2nd PLS scores for HER2-enriched and basal-like subtype (a and b) and corresponding ROC curves (c and d). (xscr_her21: the 1st x-score for HER2-enriched subtype, xscr_basal1: the 1st x-score for basal-like subtype, xscr_basal2: the 2nd x-score for basal-like subtype).