| Literature DB >> 28282392 |
Paul Desbordes1,2, Su Ruan1, Romain Modzelewski1,3, Pascal Pineau2, Sébastien Vauclin2, Pierrick Gouel3, Pierre Michel4, Frédéric Di Fiore5, Pierre Vera1,3, Isabelle Gardin1,3.
Abstract
PURPOSE: In oncology, texture features extracted from positron emission tomography with 18-fluorodeoxyglucose images (FDG-PET) are of increasing interest for predictive and prognostic studies, leading to several tens of features per tumor. To select the best features, the use of a random forest (RF) classifier was investigated.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28282392 PMCID: PMC5345816 DOI: 10.1371/journal.pone.0173208
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
List of patient features.
| Features | Number of patients |
|---|---|
| Patient’s age (years) | |
| Median (range) | 63 (46-85) |
| Patient’s gender | |
| Male | 54 (83%) |
| Female | 11 (17%) |
| Tumor location | |
| Upper third | 18 (28%) |
| Middle third | 26 (40%) |
| Lower third | 21 (32%) |
| Histology | |
| Adenocarcinoma (ADC) | 8 (12%) |
| Squamous cell carcinoma (SCC) | 57 (88%) |
| Clinical Stage | |
| II | 17 (26%) |
| III | 39 (60%) |
| IV | 9 (14%) |
| 3-year survival | |
| Alive | 24 (37%) |
| Dead | 41 (63%) |
| 1-month response | |
| Complete (CR) | 41 (63%) |
| Non-complete (NCR) | 24 (37%) |
| Follow-up (month) | |
| Median (range) | 23 (6-79) |
List of the initial features.
| Type of features | Features |
|---|---|
| Clinical data | Patient’s age, Patient’s gender, |
| Albumin level (g/l), nutritional risk index (NRI), Malnutrition | |
| Patient’s current weight (kg), Usual weight (kg), Weight loss (%), | |
| Tumor location (up, mid, low), Histology (ADC or SCC), | |
| TNM stage, WHO performance status, | |
| Endoscopic tumor length (cm) | |
| 1st order statistics | SUVmax, SUVmean, SUVpeak, Sum of SUVs (SUVsum) |
| MTV, TLG, Standard Deviation (SD), COV, Sphericity, | |
| Skewness, Kurtosis, Energy, Entropy, | |
| SUV10, SUV90, SUV10-SUV90, V10, V90, V10-V90 | |
| Texture indices | |
| Contrast, Homogeneity, Inverse Differential Moment (IDM), | |
| Cluster Shadey, Cluster Tendency | |
| Low Gray level Zone Emphasis (LGZE), High Gray-level Zone | |
| Emphasis (HGZE), Short Zone Low Gray-level Emphasis (SZLGE), | |
| Long Zone Low Gray-level Emphasis (LZLGE), Short Zone High | |
| Gray-level Emphasis (SZHGE), Long Zone HighGray-level | |
| Emphasis (LZHGE), Zone Percentage (ZP), Gray Level Non | |
| Uniformity (GLNUz), Zone Length Non Uniformity (ZLNU) | |
*absent if NRI > 97.5, average if 83.5 ≤ NRI ≤ 97.5 and severe if NRI < 83.5.
**mathematical expressions of features come from Table 1 of the Supplemental Data from [22]
Fig 1PET/CT slice of the chest with a segmented esophageal tumor.
Fig 2Workflow of the feature selection strategy and data analysis.
: Results of classifications performed without any feature selection, with the proposed method, using only the pre-selection step (Spearman’s correlation analysis) or only the selection by RF algorithm (RF importance coefficients).
| Study | Feature selection | RFerr (%) | AUCRF | Se (%) | Sp (%) |
|---|---|---|---|---|---|
| Predictive | Without | 25±7 | 0.798±0.084 | 74±10 | 88±12 |
| Only pre-selection | 28±5 | 0.788±0.074 | 76±7 | 85±11 | |
| Only selection by RF | 30±8 | 0.745±0.092 | 62±16 | 90±14 | |
| Proposed method | 21±9 | 0.836±0.105 | 82±9 | 91±12 | |
| Prognostic | Without | 34±6 | 0.677±0.097 | 78±10 | 65±22 |
| Only pre-selection | 31±10 | 0.698±0.085 | 80±15 | 68±12 | |
| Only selection by RF | 32±11 | 0.661±0.135 | 89±13 | 54±20 | |
| Proposed method | 28±5 | 0.822±0.059 | 79±9 | 95±6 |
Groups of correlated features (clinical, 1st order and texture) created with an absolute threshold value of the Spearman’s correlation coefficient of 0.8.
The feature selected to represent each group for the next step is in bold.
| Group | Correlated features |
|---|---|
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
| 9 |
Ranking of the most important predictive and prognostic features according to the value of the coefficient of importance (CI) calculated by the RF algorithm.
Correlation group is indicated if necessary. In bold, the features selected during the last step of selection leading to the best subsets of features.
| Rank | Predictive features | CI | Prognostic features | CI |
|---|---|---|---|---|
| 1 | 0.534 | 0.272 | ||
| 2 | GLNUz | 0.319 | Patient’s age | 0.257 |
| 3 | Busyness (GLDM, group 9) | 0.236 | 0.200 | |
| 4 | Energy (1st order, group 5) | 0.220 | Patient’s weight loss | 0.155 |
| 5 | 0.181 | 0.149 | ||
| 6 | Patient’s weight loss | 0.166 | Tumor location | 0.089 |
| 7 | Patient’s usual weight (group 1) | 0.128 | SZE (GLSZM) | 0.081 |
| 8 | WHO performance status | 0.114 | Energy (1st order, group 5) | 0.077 |
| 9 | Contrast (GLCM) | 0.071 | - | - |
Results of the prediction of treatment response using the proposed method based on RF, the HFS method or the Mann-Whitney U test (p < 0.05).
ROC curves were created to obtain sensitivity (Se), specificity (Sp) and AUC.
| Features | Se | Sp | AUC | RFerr (%) | SVMerr | Mann-Whitney test |
|---|---|---|---|---|---|---|
| Subset of complementary features: | 82±9 | 91±12 | 0.836±0.105 | 21±9 | 35±6 | - |
| - MTV (group 6) | ||||||
| - Homogeneity (GLCM, group 8) | ||||||
| Subset of complementary features: | 77±15 | 86±15 | 0.814±0.093 | 29±12 | 37±8 | - |
| - Patient’s weight loss | ||||||
| - MTV (group 6) | ||||||
| - Homogeneity (GLCM, group 8) | ||||||
| - Energy (1st order, group 5) | ||||||
| - ZLNU (GLSZM, group 4) | ||||||
| Busyness (GLDM, group 9) | 66 | 88 | 0.810 | - | - | < 0.0001 |
| MTV (group 6) | 51 | 100 | 0.802 | - | - | 0.0001 |
| Patient’s weight loss | 61 | 83 | 0.737 | - | - | 0.0015 |
| Energy (1st order, group 5) | 54 | 88 | 0.723 | - | - | 0.0030 |
| GLNUz (GLSZM) | 76 | 75 | 0.718 | - | - | 0.0037 |
Prognostic results of the different features using the proposed method based on RF, the HFS method or the univariate Kaplan-Meier analysis (p < 0.05).
ROC curves were created to obtain sensitivity (Se), specificity (Sp) and AUC.
| Features | Se (%) | Sp (%) | AUC | RFerr (%) | SVMerr |
|---|---|---|---|---|---|
| Subset of complementary features: | 79±9 | 95±6 | 0.822±0.059 | 28±5 | 34±5 |
| - NRI (group 2) | |||||
| - WHO performance status | |||||
| - MTV (group 6) | |||||
| Subset of complementary features: | 55±26 | 74±19 | 0.561±0.090 | 45±8 | 49±8 |
| - T stage | |||||
| - N stage | |||||
| - Energy (group 5) | |||||
| None | - | - | - | - | - |
Fig 3Kaplan-Meier survival curves using the random forest classifier with the best prognostic subset of features defined by the proposed method.