| Literature DB >> 30872600 |
Dmitry Cherezov1, Dmitry Goldgof2, Lawrence Hall2, Robert Gillies3, Matthew Schabath4, Henning Müller5,6, Adrien Depeursinge5,7.
Abstract
We propose an approach for characterizing structural heterogeneity of lung cancer nodules using Computed Tomography Texture Analysis (CTTA). Measures of heterogeneity were used to test the hypothesis that heterogeneity can be used as predictor of nodule malignancy and patient survival. To do this, we use the National Lung Screening Trial (NLST) dataset to determine if heterogeneity can represent differences between nodules in lung cancer and nodules in non-lung cancer patients. 253 participants are in the training set and 207 participants in the test set. To discriminate cancerous from non-cancerous nodules at the time of diagnosis, a combination of heterogeneity and radiomic features were evaluated to produce the best area under receiver operating characteristic curve (AUROC) of 0.85 and accuracy 81.64%. Second, we tested the hypothesis that heterogeneity can predict patient survival. We analyzed 40 patients diagnosed with lung adenocarcinoma (20 short-term and 20 long-term survival patients) using a leave-one-out cross validation approach for performance evaluation. A combination of heterogeneity features and radiomic features produce an AUROC of 0.9 and an accuracy of 85% to discriminate long- and short-term survivors.Entities:
Mesh:
Year: 2019 PMID: 30872600 PMCID: PMC6418269 DOI: 10.1038/s41598-019-38831-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Schematic representation of a feature computation result in a workflow where a nodule is considered as a homogeneous object (on the left) and a feature computation result when heterogeneity is used for division of a nodule into habitats (on the right).
Demographic Summary of Patients in the Adenocarcinoma Data Set.
| Characteristics | Short Survival Class | Long survival class | P Value |
|---|---|---|---|
| Age, mean (SD) | 69 (8.07) | 64.45 (9.75) | 0.1161 (Unpaired student t-test) |
| Sex, N (%) | 0.2049 (Fisher exact test) | ||
| Male | 12 (60%) | 7 (35%) | |
| Female | 8 (40%) | 13 (64%) | |
| Race | 1 (Fisher exact test) | ||
| White | 20 (100%) | 20 (100%) | |
| Black, Asian, and Others | 0 (0%) | 0 (0%) | |
| Ethnicity, N (%) | 1 (Fisher exact test) | ||
| Hispanic or Latino | 1 (5%) | 0 (%) | |
| Neither Hispanic/Latino and unknown | 19 (95%) | 20 (100%) | |
| Histology, N (%) | |||
| Adenocarcinoma | 20 (100%) | 20 (100%) | |
| Squamous cell carcinoma | 0 (100%) | 0 (100%) | |
| Other, NOS, unknown | 0 (100%) | 0 (100%) | |
| Stage, N (%) | 0.07346 (Mann-Whitney U test) | ||
| I | 4 (20%) | 10 (50%) | |
| II | 5 (25%) | 5 (25%) | |
| III | 10 (50%) | 3 (15%) | |
| IV | 1 (5%) | 2 (10%) | |
| Carcinoid, unkown | 0 (0%) | 0 (0%) | |
| Tobacco Use, N (%) | |||
| Moderate (1–2 PPD) | 4 (20%) | 4 (20%) | |
| Light (<1PPD) | 0 (0%) | 1 (5%) | |
| HIST | 12 (60%) | 12 (60%) | |
| None | 0 (0%) | 3 (15%) | |
| Cigarettes Nos | 4 (20%) | 0 (0%) | |
Figure 2Suggested workflow for heterogeneity estimation.
Figure 3Set of Circular Harmonics filters used for texture signature computation (hV–Harmonic Vectors).
Figure 4Example of the malignancy/aggressiveness probability assignment for habitats in a nodule.
Heterogeneity features description.
| Feature name | Feature description |
|---|---|
| min P | Minimum value of the malignancy pseudo probability of a habitat in a nodule. |
| max P | Maximum value of the malignancy pseudo probability of a habitat in a nodule. |
| mean P | Mean value of the malignancy pseudo probability of a habitat in a nodule. |
| median P | Median value of the malignancy pseudo probability of a habitat in a nodule. |
| min A ratio | Minimum value of a habitat area in a nodule. |
| max A ratio | Maximum value of a habitat area in a nodule. |
| mean A ratio | Mean value of a habitat area in a nodule. |
| median A ratio | Median value of a habitat area in a nodule. |
| min disjoint A ratio | Minimum value of a habitat area in a nodule in the case of disjoint parts of a habitats being considered as different habitats. |
| max disjoint A ratio | Maximum value of a habitat area in a nodule in the case of disjoint parts of a habitats being considered as different habitats. |
| mean disjoint A ratio | Mean value of a habitat area in a nodule in the case of disjoint parts of a habitats being considered as different habitats. |
| median disjoint A ratio | Median value of a habitat area in a nodule in the case of disjoint parts of a habitats being considered as different habitats. |
| number of clusters | Total number of habitats in a nodule. |
| mean centroids dist | Computing mean value of habitat texture signatures for a nodule–nodule texture signature. The result is mean Euclidean distance from the nodule texture signature to its habitats texture signatures. |
| dist std centroids | Standard deviation of habitat texture signatures. |
Overview of classification models that produce the best AUROC.
| Screening time | Feature type | Feature set | Feature Selector | Classifier | AUROC | Accuracy (%) |
|---|---|---|---|---|---|---|
| Training Set | Heterogeneity |
| RfF 10 | RFs | 0.77 | 72.95 |
| Definiens | All 219 features | mRMR 17* | RFs | 0.83 | 78.77 | |
| Combined | Training st. + | none | RFs | 0.85 | 81.64 | |
| Training set | Heterogeneity |
| none |
| 0.69 | 74.88 |
| Definiens | RIDER st. | RfF 10 | RFs | 0.79 | 75 | |
| Combined | All 219 + | mRMR 25* | RFs | 0.79 | 74.4 | |
| Training set | Heterogeneity |
| RfF 10 | RFs | 0.67 | 65.7 |
| Definiens | RIDER st. | RfF 5 | RFs | 0.78 | 74.06 | |
| Combined | RIDER st. + | RfF 10 | RFs | 0.78 | 70.53 |
The first column defines the time point of the CT screening that was used in the training and test cohorts. The second column defines which feature set was extracted for a given CT screening. Heterogeneity refers to only 15 texture heterogeneity features, Definiens refers to 219 features extracted in Definiens. Combined refers to the fusion of Definiens and heterogeneity features. The feature subset column defines the order of the Circular Harmonic vectors that were used to extract texture features or a subset of the Definiens features claimed to be stable on the RIDER or training datasets. The feature selector column defines one of the feature selectors that produces the best performance. There can be no feature selector, ReliefF (RfF) with top 10 or 5 ranked features or the minimum redundancy maximum relevance (mRMR) feature selector. The classifier column defines, which of the tested classifiers performed the best. From the table we can see that most of the time random forests (RFs) outperformed other classifiers. Finally, the last two columns refer to AUROC and accuracy of the corresponding model. *Weka v.3.8.1 provides mRMR algorithm whose implementation defines the optimal number of features for a particular dataset in terms of redundancy and relevance. As a result, the selected number of features varies.
Comparison of Adenocarcinoma aggressiveness estimation results using the heterogeneity and Definiens features.
| Feature type | Feature set | Feature selector | Classifier | AUROC | Acc. (%) |
|---|---|---|---|---|---|
| Staging | NA | NA | NA | 0.67 | 65 |
| Heterogeneity |
| mRMR 1* | J48 | 0.80 | 85 |
| Definiens | all 219 features | RfF 5 | J48 | 0.71 | 77.5 |
| Combined | RIDER + | RfF 5 | RFs | 0.90 | 85 |
*Weka v.3.8.1 provides the mRMR algorithm, which defines the optimal number of features for a particular dataset in terms of redundancy and relevance. As a result, the selected number of features varies.