| Literature DB >> 35457500 |
Hou-Tai Chang1,2, Ping-Huai Wang3, Wei-Fang Chen2, Chen-Ju Lin2.
Abstract
Early detection of lung cancer has a higher likelihood of curative treatment and thus improves survival rate. Low-dose computed tomography (LDCT) screening has been shown to be effective for high-risk individuals in several clinical trials, but has high false positive rates. To evaluate the risk of stage I lung cancer in the general population not limited to smokers, a retrospective study of 133 subjects was conducted in a medical center in Taiwan. Regularized regression was used to build the risk prediction model by using LDCT and health examinations. The proposed model selected seven variables related to nodule morphology, counts and location, and ten variables related to blood tests and medical history, achieving an area under the curve (AUC) value of 0.93. The higher the age, white blood cell count (WBC), blood urea nitrogen (BUN), diabetes, gout, chronic obstructive pulmonary disease (COPD), other cancers, and the presence of spiculation, ground-glass opacity (GGO), and part solid nodules, the higher the risk of lung cancer. Subjects with calcification, solid nodules, nodules in the middle lobes, more nodules, and diseases related to thyroid, liver, and digestive systems were at a lower risk. The selected variables did not indicate causation.Entities:
Keywords: regularized regression; risk prediction; stage I lung cancer
Mesh:
Year: 2022 PMID: 35457500 PMCID: PMC9033135 DOI: 10.3390/ijerph19084633
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 4.614
Figure 1The flowchart of the proposed method.
The 26 investigated variables from health examinations, including physical examinations, medical history, family history of lung cancer, and blood tests. Of these, 18 binary variables were coded as 0/1 for No/Yes or female/male. The other eight continuous variables were from blood test results.
| Variable | Coding | Description |
|---|---|---|
| Gender | 0/1 | Female/male |
| Smoke | 0/1 | Non-smoking/smoking |
| PTB | 0/1 | Tuberculosis (TB), old TB, or tuberculous pleurisy |
| Lung radiation | 0/1 | Radiation exposure to lung |
| Asthma | 0/1 | Asthma record |
| COPD | 0/1 | Chronic obstructive pulmonary disease, chronic bronchitis, or emphysema |
| Myoma | 0/1 | Myoma record |
| Diabetes | 0/1 | Diabetes record |
| Hypertension | 0/1 | Hypertension record |
| CVA | 0/1 | Cerebrovascular accident |
| Gout | 0/1 | Gout, hyperuricemia |
| Liver | 0/1 | Diseases related to liver |
| Cardiovascular disease | 0/1 | Diseases related to heart or blood vessels, such as arrhythmia, atrial fibrillation (AF), valvular cardiac valve disease, peripheral arterial occlusive disease (PAOD), and dyslipidemia, and hyperlipidemia. |
| Digestive system | 0/1 | Diseases related to digestive system, such as colorectal polyp, gastric ulcer (GU), gastroesophageal reflux disease (GERD), and anus polyp. |
| Urinary system | 0/1 | Diseases related to urinary system, such as penile tumors, benign prostatic hyperplasia (BPH), ureteral stone, renal stone, and nephrectomy. |
| Thyroid | 0/1 | Diseases related to thyroid, such as thyroid tumor, hypothyroidism, thyroid nodule, thyroidectomy, and goiter. |
| Other cancer | 0/1 | Cancer record other than lung cancer |
| Family lung cancer | 0/1 | Family history of lung cancer |
| Age | age at visit (years) | |
| BMI | Body mass index (BMI) (kg/m2) | |
| BUN | Blood urea nitrogen (BUN) (mg/dL) | |
| Creatinine | Creatinine (mg/dL) | |
| ALT | Alanine aminotransferase (ALT) (IU/L) | |
| HGB | Hemoglobin (HGB) (g/dL) | |
| WBC | White blood cell count (SBC) (103/μL) | |
| Platelet | Platelet (103/μL) |
The 14 investigated variables from LDCT text reports. Of these, 12 binary variables were coded as 0/1 for No/Yes to describe the presence of nodule pattern, location, and lung condition. The other two continuous variables were nodule count and size.
| Variable | Coding | Description |
|---|---|---|
| Count | Total nodule counts | |
| Diameter | The diameter of the maximum nodule (cm) | |
| GGO | 0/1 | Presence of ground-glass opacity (GGO) |
| Solid | 0/1 | Presence of solid nodule |
| Part Solid | 0/1 | Presence of partial solid nodule |
| Upper | 0/1 | Presence of nodule at middle lobe |
| Middle | 0/1 | Presence of nodule at upper lobe |
| Lower | 0/1 | Presence of nodule at lower lobe |
| Spiculated | 0/1 | Presence of spiculation feature |
| Fibrotic | 0/1 | Presence of fibrotic pattern |
| Mosaic | 0/1 | Presence of mosaic pattern |
| Calcified | 0/1 | Presence of calcification pattern |
| Pneumothorax | 0/1 | Presence of pneumothorax |
| Pleural Effusion | 0/1 | Presence of pleural effusion |
The mean (standard deviation) of the continuous variables in the cancer and non-cancer groups, and the p-values of testing the significance of the variables. Age, nodule count, and diameter were found to be significant to lung cancer with p-values less than 0.05.
| Variable | Non-Cancer | Cancer | |
|---|---|---|---|
| Count | 3.08 (2.50) | 1.57 (1.39) | 0.000 |
| Diameter | 1.48 (1.64) | 1.67 (0.83) | 0.003 |
| Age a | 56.58 (9.86) | 61.33 (11.11) | 0.026 |
| BMI a | 24.29 (3.28) | 24.19 (3.24) | 0.877 |
| BUN | 15.41 (4.85) | 18.35 (10.36) | 0.094 |
| Creatinine | 0.93 (0.79) | 1.08 (1.50) | 0.484 |
| ALT | 22.08 (8.92) | 23.61 (18.11) | 0.639 |
| HGB a | 13.32 (1.27) | 13.29 (1.66) | 0.926 |
| WBC a | 6.29 (1.50) | 6.61 (1.90) | 0.372 |
| Platelet a | 221.58 (47.05) | 219.8 (50.95) | 0.855 |
a Normally distributed variables by the AD test with p-value > 0.05.
The proportion of the binary variables coded as 1 in the cancer and non-cancer groups, and the p-values of the tests for independence. Among these binary variables, having diseases related to the digestive system, nodules in the middle lobe, and spiculated nodules were significant to lung cancer.
| Variable | Non-Cancer | Cancer | |
|---|---|---|---|
| Gender a | 0.554 | ||
| Female | 58.33 | 52.58 | |
| Male | 41.67 | 47.42 | |
| Smoke a | 33.33 | 31.96 | 0.880 |
| PTB | 0.00 | 6.19 | 0.190 |
| Lung radiation | 0.00 | 1.03 | 1.000 |
| Asthma | 2.78 | 2.06 | 1.000 |
| COPD | 8.33 | 16.49 | 0.278 |
| Myoma | 0.00 | 1.03 | 1.000 |
| Diabetes | 8.33 | 16.49 | 0.278 |
| Hypertension a | 27.78 | 38.14 | 0.266 |
| CVA | 0.00 | 3.09 | 0.563 |
| Gout | 0.00 | 3.09 | 0.563 |
| Liver | 5.56 | 1.03 | 0.178 |
| Cardiovascular disease a | 22.22 | 17.53 | 0.538 |
| Digestive System | 11.11 | 1.03 | 0.019 |
| Urinary System | 8.33 | 12.37 | 0.759 |
| Thyroid | 11.11 | 4.12 | 0.211 |
| Other Cancer | 5.56 | 14.43 | 0.233 |
| Family lung cancer | 2.78 | 2.06 | 1.000 |
| GGO a | 27.78 | 32.99 | 0.566 |
| Solid | 91.67 | 76.29 | 0.052 |
| Part Solid | 2.78 | 11.34 | 0.179 |
| Upper a | 80.56 | 69.07 | 0.189 |
| Middle a | 30.56 | 9.28 | 0.002 |
| Lower a | 63.89 | 52.58 | 0.243 |
| Spiculated | 0.00 | 29.90 | 0.000 |
| Fibrotic | 11.11 | 15.46 | 0.781 |
| Mosaic | 2.78 | 1.03 | 0.470 |
| Calcified | 11.11 | 9.28 | 0.748 |
| Pneumothorax | 0.00 | 1.03 | 1.000 |
| Pleural Effusion | 2.78 | 6.19 | 0.674 |
a Chi-squared test was applied. Otherwise, Fisher’s exact test was applied.
Figure 2The cross-validation curve at different λ values. The left vertical-dotted line with ln λ = −3.306 (or λ = 0.037) had the highest AUC value, indicating that the model used 17 variables that had the best prediction performance.
Figure 3The fraction of deviance explained on the training data when using different numbers of variables. The coefficients of the variables did not differ a lot when selecting 17 variables in the regression model.
The prediction performance of the three regularized regression models. The best model that had the highest AUC value was a Lasso model by using λ = 0.037 and α = 1.
| Average of 5-Fold Cross Validation | ||||
|---|---|---|---|---|
| Lasso | Ridge | Elastic Net | Best Model | |
| Accuracy | 0.72 | 0.73 | 0.73 | 0.89 |
| Sensitivity | 0.75 | 0.78 | 0.75 | 0.85 |
| Specificity | 0.64 | 0.58 | 0.67 | 1.00 |
| Precision | 0.84 | 0.83 | 0.86 | 1.00 |
| F1-measure | 0.79 | 0.80 | 0.80 | 0.92 |
| G-mean | 0.68 | 0.67 | 0.70 | 0.92 |
| AUC | 0.78 | 0.77 | 0.77 | 0.93 |
Figure 4The ROC curve of the best model reached an AUC value of 0.929. The best cut-off value of the proposed regression model was 0.478.