| Literature DB >> 35711832 |
Ruiling Zu1, Lin Wu2, Rong Zhou3, Xiaoxia Wen4, Bangrong Cao5, Shan Liu4, Guishu Yang1, Ping Leng4, Yan Li4, Li Zhang1, Xiaoyu Song1, Yao Deng1, Kaijiong Zhang1, Chang Liu1, Yuping Li1, Jian Huang6, Dongsheng Wang1, Guiquan Zhu7, Huaichao Luo1,6.
Abstract
Objectives: As the pulmonary nodules were hard to be discriminated as benignancy or malignancy only based on imageology, a prospective and observational real-world research was devoted to develop and validate a predictive model for managing the diagnostic challenge.Entities:
Keywords: XGBoost; clinical laboratory; diagnosis; lung cancer; platelets; pulmonary nodules
Year: 2022 PMID: 35711832 PMCID: PMC9174863 DOI: 10.7150/jca.67428
Source DB: PubMed Journal: J Cancer ISSN: 1837-9664 Impact factor: 4.478
Figure 1The work-flow chart of this study. Primarily, there were total of 574 individuals enrolled in this research. After retrospectively analyzed, 106 individuals without pathology results, other 40 individuals without CT results, and other 9 without complete clinical information were excluded.
The demographics and characteristics of all participants
| Traning | Testing | |||||
|---|---|---|---|---|---|---|
|
|
|
|
| |||
| 82 (27.8%) | 213 (72.2%) | 34 (27.4%) | 90 (72.6%) | |||
| gender: | 0.215 | 0.095 | ||||
| Female | 32 (39.0%) | 102 (47.9%) | 11 (32.4%) | 46 (51.1%) | ||
| Male | 50 (61.0%) | 111 (52.1%) | 23 (67.6%) | 44 (48.9%) | ||
| up: | 0.096 | 0.746 | ||||
| down | 41 (50.0%) | 82 (38.5%) | 15 (44.1%) | 35 (38.9%) | ||
| up | 41 (50.0%) | 131 (61.5%) | 19 (55.9%) | 55 (61.1%) | ||
| GGN: | 0.276 | 0.287 | ||||
| GGN | 10 (12.2%) | 39 (18.3%) | 1 (2.94%) | 10 (11.1%) | ||
| non-GGN | 72 (87.8%) | 174 (81.7%) | 33 (97.1%) | 80 (88.9%) | ||
| glitch: | 0.024 | 0.109 | ||||
| glitch | 14 (17.1%) | 66 (31.0%) | 6 (17.6%) | 31 (34.4%) | ||
| non-glitch | 68 (82.9%) | 147 (69.0%) | 28 (82.4%) | 59 (65.6%) | ||
| age: | 0.091 | 0.014 | ||||
| <=60 | 52 (63.4%) | 110 (51.6%) | 26 (76.5%) | 45 (50.0%) | ||
| >60 | 30 (36.6%) | 103 (48.4%) | 8 (23.5%) | 45 (50.0%) | ||
| stage: | <0.001 | <0.001 | ||||
| benign | 82 (100%) | 0 (0.00%) | 34 (100%) | 0 (0.00%) | ||
| I-II | 0 (0.00%) | 135 (63.4%) | 0 (0.00%) | 57 (63.3%) | ||
| III-IV | 0 (0.00%) | 59 (27.7%) | 0 (0.00%) | 25 (27.8%) | ||
| NS | 0 (0.00%) | 19 (8.92%) | 0 (0.00%) | 8 (8.89%) | ||
| smoking: | 0.888 | 1.000 | ||||
| Ever/current | 29 (35.4%) | 79 (37.1%) | 13 (38.2%) | 35 (38.9%) | ||
| Never | 53 (64.6%) | 134 (62.9%) | 21 (61.8%) | 55 (61.1%) | ||
| size: | 1.000 | 0.620 | ||||
| <=3cm | 46 (56.1%) | 118 (55.4%) | 22 (64.7%) | 52 (57.8%) | ||
| >3cm | 36 (43.9%) | 95 (44.6%) | 12 (35.3%) | 38 (42.2%) | ||
Figure 2ALL the feature importance scores for the XGBoost model (A). Feature importance scores for the final model (B). Size: the largest diameter of the nodules in millimeter detected by LDCT; bPLT: platelet counts in whole blood sample; bMPV: mean platelet volume in whole blood sample; bPDW: platelet distribution width in whole blood sample; bPCT: plateletcrit in whole blood sample; pPLT: platelet counts in PRP samples; pMPV: mean platelet volume in PRP samples; pPDW: platelet distribution width in PRP sample; pPCT: plateletcrit in PRP sample; GGN: the nodule is ground glass/nonsolid; smoking_y: the years of smoking; glitch: the edge of nodule has spicules; up: the nodule is located in an upper lobe; SN: the nodule is solid; COPD: the patient has a history of COPD; family_history: the patient has a family history of lung cancer; lung cancer history: the patient has a history of extrathoracic cancer that was diagnosed 5 years ago; quit_smoking_y: years since the patient has been quitting smoking; quit_smoking: whether the patient has quit smoking; smoking: whether the patient has smoked.
Figure 3The performance of SCHC model in training and testing cohort. Scatter plots indicating the predictive probabilities calculated by SCHC from the benign and malignant groups of training cohort(A) and testing cohort(C). Reciever operating characteristic curves for performance of SCHC model in training cohort (B) and testing cohort(D). *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001.
Figure 4Classification performance in different subgroup patients across different nodule size ranges. The ROC-AUC, specificity and sensitivity (AUC at max, yellow, left; at 90% sensitivity, green, right) were given for the different age, gender, stage and pathology. The vertical axis was different range of nodule size in millimeter. The color intensity was related to the AUC values (AUC at max, yellow) and specificity (at 90% sensitivity, green). SE: sensitivity; SP: specificity.
Figure 5The performance of SCHC model in development cohort compared with other three models. Reciever operating characteristic curves for performance of the model, when the AUC was held at a performance of the max (A), and the sensitivity was held at a performance of 90 % (C). Alluvial diagrams indicating the misclassification for all the models, when the AUC was held at a performance of the max (B), and the sensitivity was held at a performance of 90 % (D). VA_P: the probabilities calculated using VA model; MC_P: the probabilities calculated using MC model; BU_P: the probabilities calculated using BU model; XGB_P: the probabilities calculated using SCHC model. Integrated discrimination improvement for discrimination (IDI), net reclassification improvement (NRI) in the development cohort for SCHC model and other three models (E). Decision curve shown for all models across all threshold probabilities (F). Receiver operating characteristic curves for performance of the models in validation cohort (G).
Figure 6Web tool of the SCCH model for calculating the probability of malignant pulmonary nodules.