| Literature DB >> 19584918 |
Dechang Chen1, Kai Xing, Donald Henson, Li Sheng, Arnold M Schwartz, Xiuzhen Cheng.
Abstract
Accurate prediction of survival rates of cancer patients is often key to stratify patients for prognosis and treatment. Survival prediction is often accomplished by the TNM system that involves only three factors: tumor extent, lymph node involvement, and metastasis. This prediction from the TNM has been limited, because other potential prognostic factors are not used in the system. Based on availability of large cancer datasets, it is possible to establish powerful prediction systems by using machine learning procedures and statistical methods. In this paper, we present an ensemble clustering-based approach to develop prognostic systems of cancer patients. Our method starts with grouping combinations that are formed using levels of factors recorded in the data. The dissimilarity measure between combinations is obtained through a sequence of data partitions produced by multiple use of PAM algorithm. This dissimilarity measure is then used with a hierarchical clustering method in order to find clusters of combinations. Prediction of survival is made simply by using the survival function derived from each cluster. Our approach admits multiple factors and provides a practical and useful tool in outcome prediction of cancer patients. A demonstration of use of the proposed method is given for lung cancer patients.Entities:
Mesh:
Year: 2009 PMID: 19584918 PMCID: PMC2702512 DOI: 10.1155/2009/632786
Source DB: PubMed Journal: J Biomed Biotechnol ISSN: 1110-7243
Algorithm 1Ensemble algorithm of clustering of cancer data.
Lung cancer data of 90,214 patients. Survival time is measured in months. Here, adeno, squamous, large, and small represent adenocarcinoma, squamous cell carcinoma, large cell carcinoma, and small cell carcinoma, respectively.
| Patient | Survival time ( | Stage ( | Grade ( | Histology ( | Gender ( |
|---|---|---|---|---|---|
| 1 | 64 | 1 | 2 | squamous | 1 |
| 2 | 24 | 1 | 3 | large | 1 |
| 3 | 24 | 2 | 3 | squamous | 1 |
| 4 | 8 | 1 | 2 | squamous | 1 |
| 5 | 16 | 3 | 3 | squamous | 2 |
| 6 | 143 | 3 | 2 | adeno | 2 |
| 7 | 6 | 3 | 3 | small | 2 |
| 8 | 1 | 4 | 4 | small | 1 |
| 9 | 9 | 1 | 3 | adeno | 2 |
| — | — | — | — | — | — |
| — | — | — | — | — | — |
| 90211 | 1 | 1 | 3 | squamous | 1 |
| 90212 | 2 | 1 | 2 | adeno | 1 |
| 90213 | 62 | 2 | 3 | adeno | 1 |
| 90214 | 4 | 4 | 4 | squamous | 2 |
A list of 128 combinations based on factor levels. Here, adeno, squamous, large, and small represent adenocarcinoma, squamous cell carcinoma, large cell carcinoma, and small cell carcinoma, respectively.
| Group name | Stage ( | Grade ( | Histology ( | Gender ( | Sample size |
|---|---|---|---|---|---|
| Comb 1 | I | 1 | adeno | 1 | 1008 |
| Comb 2 | I | 1 | adeno | 2 | 1426 |
| Comb 3 | I | 1 | squamous | 1 | 430 |
| Comb 4 | I | 1 | squamous | 2 | 187 |
| Comb 5 | I | 1 | large | 1 | 8 |
| Comb 6 | I | 1 | large | 2 | 4 |
| Comb 7 | I | 1 | small | 1 | 2 |
| Comb 8 | I | 1 | small | 2 | 2 |
| Comb 9 | I | 2 | adeno | 1 | 2389 |
| Comb 10 | I | 2 | adeno | 2 | 2662 |
| — | — | — | — | — | — |
| — | — | — | — | — | — |
| Comb 123 | IV | 4 | squamous | 1 | 163 |
| Comb 124 | IV | 4 | squamous | 2 | 70 |
| Comb 125 | IV | 4 | large | 1 | 1503 |
| Comb 126 | IV | 4 | large | 2 | 911 |
| Comb 127 | IV | 4 | small | 1 | 4246 |
| Comb 128 | IV | 4 | small | 2 | 3368 |
Figure 1Dendrogram from clustering of lung cancer data.
Seven groups produced by cutting the dendrogram in Figure 1 at the height 0.93.
| Group | Combinations | Sample size |
|---|---|---|
| Group 1 | Stage I, Grade 1, adeno | 11303 |
| Stage I, Grade 2, adeno | ||
| Stage I, Grade 2, squamous, female | ||
| Stage I, Grade 3, adeno, female | ||
| Stage I, Grade 4, adeno, female | ||
| Group 2 | Stage I, Grade 1, squamous | 13431 |
| Stage I, Grade 2, squamous, male | ||
| Stage I, Grade 3, adeno, male | ||
| Stage I, Grade 3, squamous | ||
| Stage I, Grade 3, large cells, female | ||
| Stage I, Grade 4, adeno, male | ||
| Stage I, Grade 4, large cells | ||
| Stage II, Grade 1, adeno, female | ||
| Stage II, Grade 2, adeno, female | ||
| Stage II, Grade 2, squamous, female | ||
| Group 3 | Stage I, Grade 1, squamous, male | 4522 |
| Stage I, Grade 3, large cells, male | ||
| Stage I, Grade 4, squamous, male | ||
| Stage II, Grade 1, adeno, male | ||
| Stage II, Grade 2, adeno, male | ||
| Stage II, Grade 2, squamous, male | ||
| Stage II, Grade 3, adeno | ||
| Stage II, Grade 3, squamous | ||
| Stage II, Grade 4, large cells | ||
| Group 4 | Stage I, Grade 4, small cells | 4291 |
| Stage II, Grade 4, small cells | ||
| Stage III, Grade 1, adeno | ||
| Stage III, Grade 2, adeno | ||
| Group 5 | Stage III, Grade 1, squamous | 24951 |
| Stage III, Grade 2, squamous | ||
| Stage III, Grade 3 | ||
| Stage III, Grade 4, adeno | ||
| Stage III, Grade 4, squamous, male | ||
| Stage III, Grade 4, large cells | ||
| Stage III, Grade 4, small cells | ||
| Group 6 | Stage IV, Grade 1, adeno, male | 18215 |
| Stage IV, Grade 1, squamous, male | ||
| Stage IV, Grade 2, adeno | ||
| Stage IV, Grade 2, squamous, male | ||
| Stage IV, Grade 3, adeno, female | ||
| Stage IV, Grade 3, squamous, female | ||
| Stage IV, Grade 3, small cells | ||
| Stage IV, Grade 4, adeno | ||
| Stage IV, Grade 4, small cells | ||
| Group 7 | Stage IV, Grade 2, squamous, female | 12237 |
| Stage IV, Grade 3, adeno, male | ||
| Stage IV, Grade 3, squamous, male | ||
| Stage IV, Grade 3, large cells | ||
| Stage IV, Grade 4, squamous, male | ||
| Stage IV, Grade 4, large cells | ||
Figure 2Survival curves of seven groups in Table 3.
Figure 3Survival curves of four TNM stages.