| Literature DB >> 34886225 |
Chi-Chang Chang1,2, Tse-Hung Huang3,4,5,6, Pei-Wei Shueng7,8, Ssu-Han Chen9,10, Chun-Chia Chen11,12, Chi-Jie Lu13,14,15, Yi-Ju Tseng16.
Abstract
Despite a considerable expansion in the present therapeutic repertoire for other malignancy managements, mortality from head and neck cancer (HNC) has not significantly improved in recent decades. Moreover, the second primary cancer (SPC) diagnoses increased in patients with HNC, but studies providing evidence to support SPCs prediction in HNC are lacking. Several base classifiers are integrated forming an ensemble meta-classifier using a stacked ensemble method to predict SPCs and find out relevant risk features in patients with HNC. The balanced accuracy and area under the curve (AUC) are over 0.761 and 0.847, with an approximately 2% and 3% increase, respectively, compared to the best individual base classifier. Our study found the top six ensemble risk features, such as body mass index, primary site of HNC, clinical nodal (N) status, primary site surgical margins, sex, and pathologic nodal (N) status. This will help clinicians screen HNC survivors before SPCs occur.Entities:
Keywords: head and neck cancer; risk prediction; second primary cancers; stacked ensemble-based classification scheme
Mesh:
Year: 2021 PMID: 34886225 PMCID: PMC8657249 DOI: 10.3390/ijerph182312499
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1The overall flowchart of the proposed method, in which p, l, k, and m mean number of features, number of base classifiers, number of folds, and number of random values to try for each tuning hyper-parameter.
Figure 2Inter-classifier correlations among base classifiers during training stage: (a) initial inter-classifier correlation matrix; (b) first iteration; (c) second iteration. Warm and cold color mean positive and negative inter-correlation, respectively. The darker the color, the stronger the degree.
Model performance of base classifiers or meta-classifiers for testing dataset.
| Accuracy | Sensitivity | Specificity | Balanced Accuracy | AUC | |
|---|---|---|---|---|---|
| LGR | 0.759 | 0.699 | 0.767 | 0.733 | 0.813 |
| MARS | 0.703 |
| 0.696 | 0.725 | 0.800 |
| Ctree | 0.758 | 0.710 | 0.765 | 0.738 | 0.812 |
| EVtree | 0.786 | 0.689 | 0.800 | 0.745 | 0.777 |
| LMT | 0.818 | 0.452 | 0.869 | 0.661 | 0.729 |
| RF |
| 0.459 |
| 0.677 | 0.799 |
| BPNN | 0.746 | 0.715 | 0.750 | 0.733 | 0.791 |
| Ensemble with base classifier removal scheme | 0.778 | 0.738 | 0.783 |
|
|
| Ensemble without base classifier removal scheme | 0.782 | 0.705 | 0.792 | 0.749 | 0.836 |
Balanced accuracy meant the average value of sensitivity and specificity. AUC: Area Under Curve. Bold font indicates the best performance.
Figure 3Receiver operating characteristics (ROCs) of all classifiers for testing dataset.
Figure 4The 14 ensemble features importance for SPCs of head and neck cancers by the meta-classifier. “Direction” is based on the LGR analysis results and represents the direction of the correlation between features and the risk for SPCs. Red and blue bars mean negative and positive correlation, respectively. Grey bars mean categorical data. The most important feature ranked in the first place; on the contrary, the feature with the lowest importance is ranked as the last.