| Literature DB >> 32411600 |
Mengmeng Yan1, Weidong Wang2,3.
Abstract
Purpose: To find out the CT radiomics features of differentiating lung adenocarcinoma from another lung cancer histological type.Entities:
Keywords: lung adenocarcinoma; lung cancer histological types; multi-instance learning; radiomics; texture analysis
Year: 2020 PMID: 32411600 PMCID: PMC7200977 DOI: 10.3389/fonc.2020.00602
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Figure 1The pipeline of our proposed radiomics analysis. (1) Original images of lung cancer patients. (2) Tumor area of interest (ROI) segmentation of each slice of CT. (3) Extraction of shape, first-order features and higher-order features from the ROI. (4) Prediction model building based on machine learning classifiers, ROC curves used to assess the model performance. Adc is lung Adenocarcinoma, and Oth are other lung cancer histological subtypes.
The analysis results of three independent data sets.
| Skewness, kurtosis, energy | |
| Sphericity, compacity, volume | |
| Short-zone emphasis, high gray-level zone emphasis, short-zone low gray-level emphasis, short-zone high gray-level emphasis, long-zone low gray-level emphasis, zone length non-uniformity, low gray-level run emphasis, high gray-level run emphasis | |
| Short-run emphasis, long-run emphasis, low gray-level run emphasis, high gray-level run emphasis, short-run high gray-level emphasis | |
| Coarseness, contrast | |
| Homogeneity, energy, contrast, correlation, dissimilarity | |
| Conventional Indices | minValue, maxValue, meanValue, stdValue |
| Sphericity, compacity | |
| Short-run emphasis, low gray-level run emphasis, high gray-level run emphasis | |
| Homogeneity, energy, contrast, correlation, dissimilarity | |
| 97.44 | |
| 97.44 | |
| Ksrar | 96.15 |
First-order features-histogram.
First order features-shape.
Gray-Level Zone Length Matrix, provides information on the size of homogeneous zones for each gray-level in 3 dimensions.
Gray-Level Run Length Matrix, gives the size of homogeneous runs for each gray level. This matrix is computed for the 13 different directions in 3D (4 in 2D) and each of the 11 texture indices derived from this matrix, the 3D value is the average over the 13 directions in 3D (4 in 2D).
Neighborhood Gray-Level Different Matrix, corresponds to the difference of gray-level between one voxel and its 26 neighbors in 3 dimensions (8 in 2D).
Gray Level Co-occurrence Matrix, takes into account the arrangements of pairs of voxels to calculate textural indices.
logistic regression.
Random Committee.
Sequential minimal optimization.
Random Forest.
Naive Bayes.
The best accuracy ratios are highlighted in bold.
Performance metrics of 6 classifiers on the train set and test set.
| Train set | 0.993 | 0.996 | |||||
| Test set | 0.974 | ||||||
| Train set | 96.40 | 0.967 | 0.961 | 0.964 | 0.928 | 0.07 | |
| Test set | 0.974 | 0.05 | |||||
| Train set | 97.72 | 0.961 | 0.993 | 0.978 | 0.977 | 0.954 | 0.02 |
| Test set | 97.44 | 0.974 | 0.974 | 0.974 | 0.974 | 0.949 | 0.03 |
| Train set | 97.72 | 0.974 | 0.980 | 0.977 | 0.954 | 0.10 | |
| Test set | 97.44 | 0.974 | 0.974 | 0.974 | 0.999 | 0.949 | 0.08 |
| Train set | 97.01 | 0.948 | 0.993 | 0.972 | 0.994 | 0.942 | 0.06 |
| Test set | 0.974 | 0.974 | 0.05 | ||||
| Kstar | |||||||
| Train set | 96.08 | 0.922 | 0.964 | 0.997 | 0.921 | 0.10 | |
| Test set | 96.15 | 0.949 | 0.949 | 0.974 | 0.997 | 0.923 | 0.10 |
logistic regression.
Random Committee.
Sequential minimal optimization.
Random Forest.
Naive Bayes.
True Positive Rate.
True Negative Rate.
Area under curve.
Mean absolute error.
The best performance metrics for each set are highlighted in bold.
Figure 2Mean ROC curves obtained by six machine learning models for predicting lung adenocarcinoma. The black diagonal line in the diagram is the random line which is the worst possible performance a model can achieve. (A) Logistic regression (LR), naive bayes (NB), and random committee (RC) classifiers all have the same AUC. (B) Random forest (RF) classifier. (C) Kstar classifier. (D) Sequential minimal optimization (SMO) classifier.
Patient characteristics.
| Size, N | 180 | 535 |
| Mean Age | 66 | 69 |
| Gender (%) | ||
| Female | 30.6 | 33.3 |
| Male | 69.4 | 66.7 |
| Histological type, N | ||
| Adenocarcinoma | 90 | 193 |
| Squamous cell carcinoma | 30 | 132 |
| Other primary lung cancer | 30 | 79 |
| Metastases | 30 | 131 |
| 33 |
Paired t-test with 95% Confidence Interval, two-tailed.
They are Volume_Shape, Long-Run Emphasis_Gray-Level Run Length Matrix, Coarseness_Neighborhood Gray-Level Different Matrix, Contrast_Neighborhood Gray-Level Different Matrix, Long-Zone Low Gray-level Emphasis_Gray-Level Zone Length Matrix, Zone Length Non-Uniformity_Gray-Level Zone Length Matrix, Low Gray-level Run Emphasis_Gray-Level Zone Length Matrix, High Gray-level Run Emphasis_Gray-Level Zone Length Matrix.
The calculation formulas of performance metrics.
| TPR | |
| TNR | |
| Accuracy | |
| Precision | |
| AUC | |
| Kappa | |
| MAE |
TP is true positive, it means that the outcome from a prediction is lung adenocarcinoma (Adc) and the actual value is also Adc. FN is false negative, it means that the prediction outcome is another lung cancer histological type(Oth) while the actual value is Adc. TN is true negative, it means that both the prediction outcome and the actual value are Oth. FP is false positive, it means that the outcome from a prediction is Adc while the actual value is Oth. P is condition positive, N is condition negative, and MAE is the mean absolute errors. TPR is true positive rate, it measures the proportion of actual patients with Adc that are correctly identified. A negative result in a test with high TPR is useful for ruling in disease, it signifies a high probability of the presence of Oth. TNR is true negative rate, it measures the proportion of actual patients with Oth that are correctly identified. A test with 100% TNR will recognize all patients with Oth by testing negative, and a positive test result would definitively rule out the presence of Oth in a patient.