| Literature DB >> 25080202 |
Abstract
BACKGROUND: In cancer prognosis research, diverse machine learning models have applied to the problems of cancer susceptibility (risk assessment), cancer recurrence (redevelopment of cancer after resolution), and cancer survivability, regarding an accuracy (or an AUC--the area under the ROC curve) as a primary measurement for the performance evaluation of the models. However, in order to help medical specialists to establish a treatment plan by using the predicted output of a model, it is more pragmatic to elucidate which variables (markers) have most significantly influenced to the resulting outcome of cancer or which patients show similar patterns.Entities:
Mesh:
Year: 2014 PMID: 25080202 PMCID: PMC4101306 DOI: 10.1186/1755-8794-7-S1-S4
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Figure 1Schematic description on the procedure of the proposed method. Schematic description on the procedure of the proposed method.
Figure 2Original SSL Graph. In graph-based SSL, the labeled nodes are represented by '+1(survived)' and '-1(dead)', whereas unlabeled nodes are represented by '?'.
Figure 3SSL Co-training: Predicted labels. The two figures 2 and 3 provide schematic description of SSL Co-training. At the start of the algorithm, each of the member models (for simplicity, we assume two classifiers) is trained on the original graph in Figure 2.
Figure 4SSL Co-training: Pseudo labeled graph. After training, both member models produce predicted labels for the unlabeled nodes. The unlabeled nodes are pseudo-labeled when the member models agree on labeling, or it remains unlabeled. The resulting graph is shown in Figure 4.
Figure 5Using a decision tree to obtain variable importance and segmentation by reclassifying the results of the predictor module. Using a decision tree to obtain variable importance and segmentation by reclassifying the results of the predictor module.
Prognostic elements of breast cancer survivability (SEER).
| Prognostic elements | Description | Number of distinct values / mean ± std.dev | ||
|---|---|---|---|---|
| 1 | Ethnicity: White, Black, Chinese, etc. | 16 | ||
| 2 | None, Beam Radiation, Radioisotopes, Refused, Recommended, etc. | 6 | ||
| 3 | Presence of tumor at particular location in body. Topographical classification of cancer. | 9 | ||
| 4 | Form and structure of tumor | 30 | ||
| 5 | Normal or aggressive tumor behavior is defined using codes. | 2 | ||
| 6 | Appearance of tumor and its similarity to more or less aggressive tumors | 5 | ||
| 7 | Information on surgery during first course of therapy, whether cancer-directed or not. | 12 | ||
| 8 | Defined by size of cancer tumor and its spread | 10 | ||
| 9 | Defines the spread of the tumor relative to the breast | 16 | ||
| 10 | None, (1-3) Minimal, (4-9) Significant, etc. | 7 | ||
| 11 | Married, Single, Divorced, Widowed, Separated | 4 | ||
| 12 | Actual age of patient in years | 63.64 ± 14.25 | ||
| 13 | 2-5 cm; at 5 cm, the prognosis worsens | 116.78 ± 286.64 | ||
| 14 | When lymph nodes are involved in cancer, they are known as positive. | 27.29 ± 42.26 | ||
| 15 | The total number of (positive/negative) lymph nodes that were removed and examined by the pathologist. | 13.61 ± 17.49 | ||
| 16 | Number of primary tumors (1-6) | 0.54 ± 1.29 | ||
| Target binary variable defines class of survival of patient. | ||||
Performance (AUC) comparison of the five predictive models
| Data Set | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.74 | 0.50 | 0.50 | 0.78 | 0.80 | 0.82 | 0.79 | 0.77 | 0.78 | 0.80 | ||
| 0.68 | 0.72 | 0.68 | 0.72 | 0.66 | 0.68 | 0.67 | 0.73 | 0.71 | 0.73 | ||
| 0.79 | 0.79 | 0.80 | 0.79 | 0.82 | 0.78 | 0.79 | 0.82 | 0.81 | 0.81 | ||
| 0.77 | 0.79 | 0.78 | 0.76 | 0.78 | 0.77 | 0.77 | 0.8 | 0.78 | 0.8 | ||
| 0.84 | 0.82 | 0.80 | 0.81 | 0.82 | 0.84 | 0.83 | 0.82 | 0.78 | 0.81 |
Figure 6Performance (AUC) comparison over 10 data sets. Performance (AUC) comparison over 10 data sets: DT, ANN, SVM, SSL, and SSL Co-training.
Figure 7Variable Importance. Variable importance: the 16 input variables are ranked by the order of variable importance Eq.(5).
Figure 8Variable Profiling. Variable profiling: average values of the 16 variables for (a) survived patients and (b) dead patients.
Figure 9Patient Segmentation. Patient Segmentation: The first three levels of the resulting decision tree. In a node, the proportion of the survived and the dead are represented as the white bar and the black bar, respectively.
Figure 10Patient Segmentation: Feebleness of age. Patient Segmentation: Two radial diagrams in Figure 10 and 11 illustrate difference of patient segments in terms of patterns of prognosis factors.
Figure 11Patient Segmentation: A serious pattern with respect to the pathologic results. Patient Segmentation: Two radial diagrams in Figure 10 and 11 illustrate difference of patient segments in terms of patterns of prognosis factors.