| Literature DB >> 28620450 |
Eliseos J Mucaki1, Katherina Baranova1, Huy Q Pham2, Iman Rezaeian2, Dimo Angelov3, Alioune Ngom2, Luis Rueda2, Peter K Rogan1,3,4.
Abstract
Genomic aberrations and gene expression-defined subtypes in the large METABRIC patient cohort have been used to stratify and predict survival. The present study used normalized gene expression signatures of paclitaxel drug response to predict outcome for different survival times in METABRIC patients receiving hormone (HT) and, in some cases, chemotherapy (CT) agents. This machine learning method, which distinguishes sensitivity vs. resistance in breast cancer cell lines and validates predictions in patients; was also used to derive gene signatures of other HT (tamoxifen) and CT agents (methotrexate, epirubicin, doxorubicin, and 5-fluorouracil) used in METABRIC. Paclitaxel gene signatures exhibited the best performance, however the other agents also predicted survival with acceptable accuracies. A support vector machine (SVM) model of paclitaxel response containing genes ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCL2, BCL2L1, BMF, CYP2C8, CYP3A4, MAP2, MAP4, MAPT, NR1I2, SLCO1B3, TUBB1, TUBB4A, and TUBB4B was 78.6% accurate in predicting survival of 84 patients treated with both HT and CT (median survival ≥ 4.4 yr). Accuracy was lower (73.4%) in 304 untreated patients. The performance of other machine learning approaches was also evaluated at different survival thresholds. Minimum redundancy maximum relevance feature selection of a paclitaxel-based SVM classifier based on expression of genes BCL2L1, BBC3, FGF2, FN1, and TWIST1 was 81.1% accurate in 53 CT patients. In addition, a random forest (RF) classifier using a gene signature ( ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCL2, BCL2L1, BMF, CYP2C8, CYP3A4, MAP2, MAP4, MAPT, NR1I2,SLCO1B3, TUBB1, TUBB4A, and TUBB4B) predicted >3-year survival with 85.5% accuracy in 420 HT patients. A similar RF gene signature showed 82.7% accuracy in 504 patients treated with CT and/or HT. These results suggest that tumor gene expression signatures refined by machine learning techniques can be useful for predicting survival after drug therapies.Entities:
Keywords: Gene expression signatures; breast cancer; chemotherapy resistance; hormone therapy; machine learning; random forest; support vector machine
Year: 2016 PMID: 28620450 PMCID: PMC5461908 DOI: 10.12688/f1000research.9417.3
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. Biochemically-inspired SVM gene signature derivation workflow.
The initial set of genes is carefully selected through the understanding of the drug and the pathways associated with it. A multiple factor analysis of the GI 50 values of a training set of breast cancer cell lines and the corresponding expression levels of each gene in the initial set reduces the list of genes.
SVM gene expression signature performance on METABRIC patients.
| Patient
| # of patients | Agent:
| Accuracy (%) | Precision | F-Measure | MCC
[ | AUC
[ |
|---|---|---|---|---|---|---|---|
| Both CT
| 84 | Paclitaxel:
| 78.6 | 0.787 | 0.782 | 0.559 | 0.814 |
| Tamoxifen:
| 76.2 | 0.761 | 0.760 | 0.510 | 0.701 | ||
| Methotrexate:
| 71.4 | 0.712 | 0.711 | 0.410 | 0.766 | ||
| Epirubicin
| 72.6 | 0.725 | 0.723 | 0.434 | 0.686 | ||
| Doxorubicin:
| 75.0 | 0.749 | 0.750 | 0.488 | 0.701 | ||
| 5-Fluorouracil:
| 71.4 | 0.714 | 0.714 | 0.417 | 0.718 | ||
| CT and/or
| 735 | Paclitaxel:
| 66.1 | 0.652 | 0.643 | 0.287 | 0.660 |
| Deceased
| 327 | Paclitaxel:
| 75.3 | 0.752 | 0.752 | 0.505 | 0.763 |
| No
| 304 | Paclitaxel:
| 73.4 | 0.734 | 0.733 | 0.467 | 0.769 |
Initial gene sets preceding feature selection: Paclitaxel - ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCAP29, BCL2, BCL2L1, BIRC5, BMF, CNGA3, CYP2C8, CYP3A4, FGF2, FN1, GBP1, MAP2, MAP4, MAPT, NFKB2, NR1I2, OPRK1, SLCO1B3, TLR6, TUBB1, TWIST1. Tamoxifen - ABCB1, ABCC2, ALB, C10ORF11, CCNA2, CYP3A4, E2F7, F5, FLAD1, FMO1, IGF1, IGFBP3, IRS2, NCOA2, NR1H4, NR1I2, PIAS4, PPARA, PROC, RXRA, SMARCD3, SULT1B1, SULT1E1, SULT2A1. Methotrexate - ABCB1, ABCC2, ABCG2, CDK18, CDK2, CDK6, CDK8, CENPA, DHFRL1. Epirubicin - ABCB1, CDA, CYP1B1, ERBB3, ERCC1, GSTP1, MTHFR, NOS3, ODC1, PON1, RAD50, SEMA4D, TFDP2. Doxorubicin - ABCB1, ABCC2, ABCD3, AKR1B1, AKR1C1, CBR1, CYBA, FTH1, FTL, GPX1, MT2A, NCF4, RAC2, SLC22A16, TXNRD1. 5-Fluorouracil - ABCB1, ABCC3, CFLAR, IL6, MTHFR, TP53, UCK2. 1MCC: Matthews Correlation Coefficient. 2AUC: Area under receiver operating curve. 3 Surviving patients; 4 Analysis included patients in the METABRIC ‘discovery’ dataset only; 5 SVMs tested with 9 fold cross-validation, all others tested with leave-one-out cross-validation; 6 Includes all patients treated with HT,CT, combination CT/HT, either with or without combination radiotherapy; 7 Median time after treatment until death (> 4.4 years) was used to distinguish favorable outcome, ie. sensitivity to therapy.
Figure 2. RF decision tree diagram depicts the therapy outcome prediction process of a given patient, using a RF consisting of k decision trees.
Several DTs are built using different subsets of paclitaxel-related genes. The process starts from the root of each tree and if the expression of the gene corresponding to that node is greater than a specific value, the process continues through the right branch, otherwise it continues through the left branch until it reaches a leaf node; that leaf represents the prediction of the tree for that specific input. The decisions of all trees are considered and the one with the largest number of votes is selected as the patient outcome.
Results of applying RF to predict outcome of paclitaxel therapy.
| Type of treatment | Survival years (as
| # Patients |
| Accuracy (True
| Precision | F-Measure | MCC
[ | AUC
[ |
|---|---|---|---|---|---|---|---|---|
| Chemotherapy
| 3 | 53 | 7 | 56.6 | 0.510 | 0.524 | -0.095 | 0.441 |
| 4 | 7 | 69.8 | 0.698 | 0.698 | 0.396 | 0.700 | ||
| 5 | 19 | 66.0 | 0.645 | 0.636 | 0.230 | 0.653 | ||
| Hormone therapy
| 3 | 420 | 19 | 85.5 | 0.731 | 0.788 | 0.000 | 0.606 |
| 4 | 9 | 78.6 | 0.715 | 0.706 | 0.069 | 0.559 | ||
| 5 | 9 | 71.0 | 0.634 | 0.627 | 0.059 | 0.632 | ||
| CT and/or HT | 3 | 504 | 9 | 82.7 | 0.685 | 0.749 | 0.000 | 0.506 |
| 4 | 19 | 73.6 | 0.647 | 0.648 | 0.039 | 0.527 | ||
| 5 | 7 | 65.3 | 0.602 | 0.593 | 0.086 | 0.588 |
1MCC: Matthews Correlation Coefficient. 2AUC: Area under receiver operating curve; both Discovery and Validation patient datasets analyzed. RF predictions done using a gene panel consisting of 19 genes ( ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCL2, BCL2L1, BMF, CYP2C8, CYP3A4, MAP2, MAP4, MAPT, NR1I2, SLCO1B3, TUBB1, TUBB4A, TUBB4B).
Results of mRMR feature selection for an SVM for predicting outcome of paclitaxel therapy.
|
| CT
[ | HT | CT+HT | ||||||
|---|---|---|---|---|---|---|---|---|---|
|
| 3 | 4 | 5 | 3 | 4 | 5 | 3 | 4 | 5 |
|
| 53 | 420 | 504 | ||||||
|
| 81.1 | 81.1 | 84.9 | 85.7 | 79.5 | 72.9 | 83.1 | 74.8 | 67.9 |
|
| 0.809 | 0.813 | 0.852 | 0.878 | 0.765 | 0.692 | 0.795 | 0.703 | 0.662 |
|
| 0.809 | 0.811 | 0.845 | 0.794 | 0.726 | 0.663 | 0.772 | 0.672 | 0.666 |
|
| 0.582 | 0.625 | 0.675 | 0.119 | 0.17 | 0.173 | 0.161 | 0.137 | 0.238 |
|
| 0.783 | 0.812 | 0.82 | 0.508 | 0.533 | 0.548 | 0.53 | 0.531 | 0.61 |
|
| 0.0 | 0.5 | 1.0 | 1.0 | 0.75 | 1.5 | 0.75 | 0.5 | 1.0 |
|
| 64 | 128 | 8 | 2 | 64 | 2 | 16 | 2 | 2 |
|
|
|
|
|
|
|
|
|
|
|
1For patients treated with CT with ≥4 Yr survival and CT+ HT for ≥ 5 Yr, the cost for the mRMR model was set to 64. Of those treated with CT for ≥ 4 Yr, genes were selected using a greedy, stepwise forward search, while in other cases, greedy stepwise backward search was used. Also, gamma = 0 in all cases. 2Predicted responses for individual METABRIC patients are provided in Dataset 1.
Figure 3. Schematic elements of gene expression changes associated with response to paclitaxel.
Red boxes indicate genes with a positive correlation between gene expression or copy number, and resistance using multiple factor analysis. Blue demonstrates a negative correlation. Genes outlined in dark grey are those in a previously published paclitaxel SVM model (reproduced from reference 1 with permission).
Results of applying RF to predict outcome of the paclitaxel signature for the METABRIC Discovery patient set.
| Type of
| Survival
| # Patients |
| Accuracy
| Precision | F-Measure | MCC | AUC |
|---|---|---|---|---|---|---|---|---|
| Chemotherapy
| 3 | 22 | 7 | 61.1 | 0.617 | 0.612 | 0.224 | 0.444 |
| 4 | 7 | 66.7 | 0.643 | 0.646 | 0.189 | 0.715 | ||
| 5 | 19 | 66.7 | 0.722 | 0.687 | 0.189 | 0.571 | ||
| Hormone therapy
| 3 | 185 | 19 | 77.0 | 0.780 | 0.775 | 0.018 | 0.524 |
| 4 | 9 | 79.1 | 0.733 | 0.710 | 0.084 | 0.527 | ||
| 5 | 9 | 68.9 | 0.533 | 0.601 | -0.133 | 0.594 | ||
| CT and/or HT | 3 | 221 | 9 | 80.2 | 0.677 | 0.734 | -0.07 | 0.389 |
| 4 | 19 | 54.8 | 0.554 | 0.551 | -0.143 | 0.395 | ||
| 5 | 7 | 60.5 | 0.567 | 0.579 | 0.016 | 0.479 |
Paclitaxel gene panel consisted of 19 genes ( ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCL2, BCL2L1, BMF, CYP2C8, CYP3A4, MAP2, MAP4, MAPT, NR1I2, SLCO1B3, TUBB1, TUBB4A, TUBB4B).
Results of mRMR feature selection for an SVM for predicting outcome of the paclitaxel signature for the METABRIC Discovery patient set.
| Treatment | CT
[ | HT | CT+HT | ||||||
|---|---|---|---|---|---|---|---|---|---|
|
| 3 | 4 | 5 | 3 | 4 | 5 | 3 | 4 | 5 |
|
| 22 | 185 | 221 | ||||||
|
| 57.14 | 57.14 | 85.7 | 81.8 | 70.9 | 63.6 | 71.2 | 69.7 | 71.2 |
|
| 0.595 | 0.686 | 0.735 | 0.726 | 0.670 | 0.532 | 0.647 | 0.629 | 0.693 |
|
| 0.571 | 0.623 | 0.791 | 0.769 | 0.686 | 0.562 | 0.668 | 0.628 | 0.666 |
|
| 0.167 | -0.258 | 0.000 | -0.080 | 0.032 | -0.075 | 0.035 | 0.071 | 0.245 |
|
| 0.583 | 0.333 | 0.500 | 0.479 | 0.514 | 0.477 | 0.513 | 0.521 | 0.586 |
|
| 0.0 | 0.5 | 1.0 | 1.0 | 0.75 | 1.5 | 0.75 | 0.5 | 1.0 |
|
| 64 | 128 | 8 | 2 | 64 | 2 | 16 | 2 | 2 |
|
|
|
|
|
|
|
|
|
|
|
1For patients treated with CT with ≥4 Yr survival and CT+ HT for ≥ 5 Yr , the cost for the mRMR model was set to 64. Of those treated with CT for ≥ 4 Yr, genes were selected using a greedy, stepwise forward search, while in other cases, greedy stepwise backward search was used. Also, gamma = 0 in all cases.
Comparison between our mRMR+SVM method and K-TSP method on Discovery patient set of the METABRIC data.
| Data | CT | HT | CT+HT | ||||||
|---|---|---|---|---|---|---|---|---|---|
|
| 3 | 4 | 5 | 3 | 4 | 5 | 3 | 4 | 5 |
|
| 22 | 185 | 221 | ||||||
|
| 57.14 | 57.14 | 85.7 | 81.8 | 70.9 | 63.6 | 71.21 | 69.70 | 71.21 |
|
| 57.14 | 28.57 | 28.57 | 80.91 | 68.18 | 69.19 | 71.21 | 54.55 | 53.03 |
The performances of several ML techniques have been compared such that they distinguish paclitaxel sensitivity and resistance in METABRIC patients using its tumour gene expression datasets. We used mRMR to generate gene signatures and determine which genes are important for treatment response in METABRIC patients. The paclitaxel models are more accurate for prediction of outcomes in patients receiving HT and/or CT compared to other patient groups.