| Literature DB >> 34713592 |
Dahai Liu1,2, Xiao Sun1, Ao Liu1, Lun Li3, Shaoke Li4, Jinmiao Li5, Xiaojun Liu6, Yu Yang7, Zhe Wu1, Xiaoliang Leng1, Yang Wo1, Zhangfeng Huang1, Wenhao Su1, Wenxing Du1, Tianxiang Yuan1, Wenjie Jiao1.
Abstract
BACKGROUND: To develop and validate a risk prediction nomogram based on a deep learning convolutional neural networks (CNN) model and epidemiological characteristics for lung cancer screening in patients with small pulmonary nodules (SPN).Entities:
Keywords: Asians lung cancer screening; artificial intelligence; convolutional neural networks; epidemiological characteristics; nomogram
Mesh:
Year: 2021 PMID: 34713592 PMCID: PMC8636223 DOI: 10.1111/1759-7714.14140
Source DB: PubMed Journal: Thorac Cancer ISSN: 1759-7706 Impact factor: 3.500
Data set characteristics of the CNN model training cohort
| Male, n (%) | 1513, (41.5) |
|---|---|
| Age (years), mean (SD) | 61.042 ± 9.276 |
| Nodule size in diameter (mm), mean (SD) | 17.47 ± 6.97 |
| No. of LDCT images | 231 312 |
| No. of surgeons | 5 |
| Nodule locations, n (%) | |
| RUL | 1153 (31.6) |
| RML | 306 (8.4) |
| RLL | 675 (18.5) |
| LUL | 857 (23.5) |
| LLL | 564 (15.5) |
| Multilobe | 89 (2.4) |
| Pathological type, n (%) | |
| Adenocarcinoma | 3032 (83.2) |
| Squamous cell carcinoma | 314 (8.6) |
| Other types | 298 (8.2) |
Abbreviations: LDCT, low dose computed tomography; LLL, left low lobe; LUL, left upper lobe; No., numbers; RLL, right low lobe; RML, right middle lobe; RUL, right upper lobe; SD, standard deviation.
Data set characteristics of the hybrid model training cohort
| (b) Hybrid model training cohort | Malignant nodules ( | Benign nodules ( |
|
|---|---|---|---|
| Gender, Male, n (%) | 211 (42.1) | 128 (50.2) | 0.035 |
| Age, mean, years (SD) | 60.66 (9.78) | 58.09 (10.49) | <0.001 |
| Race, Han, n (%) | 500 (99.8) | 255 (100) | 0.475 |
| Marital status, married, n (%) | 497 (99.2) | 252 (98.8) | 0.608 |
| Smoking status, n (%) | <0.001 | ||
| No | 312 (62.3) | 217 (85.1) | |
| Passive | 62 (12.4) | 11 (4.3) | |
| Mild | 32 (6.4) | 5 (2.0) | |
| Heavy | 95 (16.9) | 22 (8.6) | |
| Alcohol consumption, positive, n (%) | 113 (22.6) | 67 (26.3) | 0.256 |
| Unhealthy dietary habits, n (%) | 130 (25.9) | 73 (28.6) | 0.432 |
| Family history of cancer, n (%) | <0.001 | ||
| No | 429 (85.5) | 250 (98) | |
| Other cancer history | 42 (8.4) | 3 (1.2) | |
| Lung cancer history | 29 (5.8) | 2 (0.8) | |
| Dwelling environment exposure, positive, n (%) | 16 (3.2) | 8 (3.1) | 0.967 |
| Occupational exposure, positive, n (%) | 35 (7.0) | 15 (5.9) | 0.564 |
| History of chronic disease, positive, n (%) | 243 (48.5) | 108 (42.4) | 0.109 |
| Pre‐existing lung disease, positive, n (%) | 28 (5.6) | 10 (3,9) | 0.321 |
| Nodule locations, n (%) | 0.075 | ||
| RUL | 166 (33.1) | 67 (26.3) | |
| RML | 24 (4.8) | 21 (8.2) | |
| RLL | 93 (18.6) | 63 (24.7) | |
| LRL | 115 (23) | 50 (19.6) | |
| LLL | 88 (17.6) | 47 (18.4) | |
| Multilobe | 15 (3.0) | 7 (2.7) | |
| Pathological type, n (%) | |||
| Adenocarcinoma | 435 (86.8) | ||
| Squamous cell carcinoma | 48 (9.6) | ||
| Small cell carcinoma | 5 (1.0) | ||
| Large cell carcinoma | 1 (0.2) | ||
| Adenosquamous carcinoma | 2 (0.4) | ||
| Other types carcinoma | 10 (2.0) | ||
| Hamartoma | 74 (29.0) | ||
| Papillary adenoma | 16 (6.3) | ||
| Inflammation | 91 (35.7) | ||
| Sclerosing alveolar tumor | 25 (9.8) | ||
| Tuberculosis | 14 (5.5) | ||
| Other types benign tumor | 35 (13.7) |
Note: p‐values are derived from the t‐test between the malignant and benign groups.
Abbreviations: LDCT, low dose computed tomography; LLL, left low lobe; LUL, left upper lobe; No., numbers; RLL, right low lobe; RML, right middle lobe; RUL, right upper lobe; SD, standard deviation.
p‐value < 0.05.
Data set characteristics of the multicenter validation cohort
| Variables | Total | Malignant nodules | Total | Benign nodules |
| ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| XWH | Shanghai | QUAF | QDMH | CYH | XWH | Shanghai | QUAF | QDMH | CYH | ||||
| Amount (%) | 158 (75.2) | 38 (18.1) | 39 (18.6) | 30 (14.3) | 28 (13.3) | 23 (11.0) | 52 (24.8) | 8 (3.8) | 17 (8.1) | 8 (3.8) | 12 (5.7) | 7 (3.3) | 0.623 |
| Age (SD) | 59.6 (11.4) | 56 (12.9) | 60.3 (10.4) | 60.4 (10.2) | 59.1 (12.1) | 63.04 (10.6) | 57.23 (10.8) | 55.0 (14.01) | 53.8 (10.6) | 58.5 (10.53) | 61.8 (10.5) | 58.1 (7.5) | 0.186 |
| Gender, n (%) | 0.064 | ||||||||||||
| Male | 99 (47.1) | 20 (43.5) | 23 (41.1) | 21 (55.3) | 19 (47.5) | 16 (53.3) | 25 (11.9) | 2 (4.3) | 11 (19.6) | 2 (5.3) | 7 (17.5) | 3 (10.0) | |
| Female | 59 (28.1) | 18 (39.1) | 16 (28.6) | 9 (23.7) | 9 (22.5) | 7 (23.3) | 27 (12.9) | 6 (13) | 6 (10.7) | 6 (15.8) | 5 (12.5) | 4 (13.3) | |
| Family history of cancer (%) | 0.028 | ||||||||||||
| No | 133(63.3) | 34 (73.9) | 29 (51.8) | 26 (68.4) | 24 (60) | 20 (66.7) | 50 (23.3) | 8 (17.4) | 17 (37.0) | 8 (21.1) | 11 (27.5) | 6 (20.0) | |
| Other cancer history | 13 (6.2) | 1 (2.2) | 6 (10.7) | 2 (5.3) | 3 (7.5) | 1 (3.3) | 2 (1.0) | 0 | 0 | 0 | 1 (2.5) | 1 (3.3) | |
| Lung cancer history | 12 (5.7) | 3 (7.9) | 4 (7.1) | 2 (5.3) | 1 (2.5) | 2 (6.7) | 0 | 0 | 0 | 0 | 0 | 0 | |
| Smoking status, n (%) | <0.01 | ||||||||||||
| No | 55 (26.2) | 13 (28.3) | 16 (28.6) | 13 (34.2) | 9 (22.5) | 4 (13.3) | 42 (20) | 7 (15.2) | 14 (25) | 7 (18.4) | 8 (20) | 6 (20) | |
| Passive | 21 (10.0) | 8 (17.4) | 5 (8.9) | 4 (10.5) | 1 (2.5) | 3 (10.0) | 1 (0.5) | 0 | 1 (1.8) | 0 | 0 | 0 | |
| Mild | 21 (10.0) | 2 (4.3) | 5 (8.9) | 3 (7.9) | 5 (12.5) | 6 (20.0) | 2 (1.0) | 0 | 0 | 0 | 2 (5.0) | 0 | |
| Heavy | 61 (29.0) | 15 (32.6) | 13 (23.2) | 10 (26.3) | 13 (32.5) | 10 (33.3) | 7 (3.3) | 1 (2.2) | 2 (3.6) | 1 (2.6) | 2 (5.0) | 1 (3.3) | |
| CNN model score (SD) | 86.52 (18.04) | 85.3 (17.4) | 85.5 (19.4) | 88.2 (19.9) | 89.7 (14.3) | 84.1 (19.04) | 64.06 (27.99) | 73.9 (16.7) | 66.7 (34.0) | 64.5 (12.4) | 58.4 (31.9) | 55.6 (29.7) | <0.01 |
Note: p‐values are derived from the t‐test between the malignant and benign groups.
Abbreviations: CYH, Qingdao Chengyang People's Hospital; QDMH, Qingdao Municipal Hospital; QUAF, The Affiliated Hospital of Qingdao University; SD, standard deviation; Shanghai, Shanghai Chest Hospital; XWH, Xuanwu Hospital Capital Medical University.
p‐value < 0.05.
FIGURE 1The process of the CNN training. CONV, convolutional layer; BN, batch normalization; leaky ReLU, linear element functions with leak‐correction. The 20‐layer feature extraction network with residual structure was used as the base network. The last three scale feature maps in the basic network were taken as input, and FPN (feature pyramid) structure was used to fuse the features of each layer for detection at three scales. After threshold filtering and nonmaximum suppression (NMS) treatment, the target detection boxes with low confidence were removed and the target detection boxes of the same position and type were fused to obtain the final detection results
Univariate and multivariate analysis for risk factors of lung cancer screening in training cohort
| Variable | Group | Univariate | Multivariate | |||
|---|---|---|---|---|---|---|
| Wald | OR (95%CI) |
| OR (95%CI) |
| ||
| Gender, n (%) | Male | 4.448 | 1.385 (1.023–1.875) | 0.035 | 2.409 (1.566‐3.705) | 0.039 |
| Female | ||||||
| Age, mean, years (SD) | No. | 10.747 | 1.025 (1.010–1.041) | 0.001 | 1.039 (1.019‐1.059) | 0.001 |
| Other | ||||||
| Smoking status, n (%) | No | 62.8 | 1 (reference) | <0.001 | 1 (reference) | <0.001 |
| Passive | 12.302 | 3.758 (1.793–7.874) | 4.031 (1.582–10.271) | |||
| Mild | 6.573 | 3.618 (1.354–9.669) | 5.086 (1.150–22.847) | |||
| Heavy | 50.23 | 6.056 (3.680–9.965) | 6.799 (2.907–15.902) | |||
| Alcohol consumption | Positive | 1.287 | 0.817 (0.577–1.158) | 0.257 | ‐ | ‐ |
| Negative | ||||||
| Marital status | Positive | 0.26 | 1.479 (0.329–6.660) | 0.61 | ‐ | ‐ |
| Negative | ||||||
| Family history of cancer | No | 20.207 | 1 (reference) | <0.001 | 1 (reference) | 0.001 |
| other cancer history | 11.8 | 7.946 (2.435–25.925) | 8.703 (2.051–36.937) | |||
| lung cancer history | 8.691 | 8.721 (2.067–32.802) | 11.378 (1.685–76.818) | |||
| Dwelling environment exposure | Positive | 0.332 | 1.202 (0.644–2.244) | 0.564 | ‐ | ‐ |
| Negative | ||||||
| Occupational exposure | Positive | 0.219 | 0.795 (0.305–2.077) | 0.640 | ‐ | ‐ |
| Negative | ||||||
| History of chronic disease | Positive | 0.102 | 0.948 (0.682–1.318) | 0.75 | ‐ | ‐ |
| Negative | ||||||
| Pre‐existing lung disease | Positive | 0.974 | 1.450 (0.693–3.035) | 0.324 | ‐ | ‐ |
| Negative | ||||||
| Dietary habits | Positive | 0.617 | 0.874 (0.624–1.224) | 0.432 | ‐ | ‐ |
| Negative | ||||||
| CNN model score | No. | 134.359 | 1.062 (1.051–1.073) | <0.001 | 1.084 (1.069‐1.099) | <0.001 |
Note: p‐values are derived from the univariable and multivariable regression analyses among the epidemiological characteristics.
p‐value < 0.05.
FIGURE 2Developed lung cancer prediction nomogram. Smoking, smoking status; FHC, family history of cancer; CNNS, CNN model score. The prediction nomogram was developed in the training cohort, with age, gender, smoking status, family history of cancer and the CNN model score incorporated
FIGURE 3ROC curves and calibration curves of the risk prediction nomogram and the CNN model. (a) ROC curve of the CNN model in the validation cohort. (b) ROC curve of the risk prediction nomogram in the training cohort. (c) ROC curve of the risk prediction nomogram in the validation cohort. (d) Comparison of the ROC curves between the CNN model and the risk prediction nomogram. (e) Calibration curve of the model with addition of epidemiological characteristics in the training cohort. (f) Calibration curve of the model with addition of epidemiological characteristics in the validation cohort. ROC curves showed the AUC of each model in different cohort. The comparison of the AUC was performed between the CNN model and the risk prediction nomogram by Delong's test. Calibration curves showed the calibration of each model in terms of the agreement between the postoperative pathological results and predicted risks of lung cancer. The x‐axis represents the predicted lung cancer risk, the y‐axis represents the actual lung cancer risk. The diagonal blue dotted line represents the consistency between the actual risk and the predicted risk for lung cancer. The amaranth pure solid line reveals the accuracy of prediction of our nomogram, of which a closer fit to the diagonal dotted line indicates that the prediction is more accurate