| Literature DB >> 36233503 |
Wei Chen1, Yajie Dong2,3, Lu Liu2,3, Lin Jia2,3, Lihua Meng2,3, Hongli Liu2,3, Lili Wang2,3, Ying Xu2,3, Youzhong Zhang2,3, Xu Qiao4.
Abstract
OBJECTIVE: This study aimed to identify reliable risk factors for residual/recurrent cervical intraepithelial lesions in patients with negative margins after cold-knife conization.Entities:
Keywords: cold-knife conization; machine learning; practical model; residual/recurrent
Year: 2022 PMID: 36233503 PMCID: PMC9573483 DOI: 10.3390/jcm11195634
Source DB: PubMed Journal: J Clin Med ISSN: 2077-0383 Impact factor: 4.964
Figure 1Flowchart of the study. (A) Overall dataset; (B) model development and validation; (C) model training and validation based on six machine learning methods. HSIL, high-grade squamous intraepithelial lesion; AUC, area under the curve; FNR, false-negative rate; FPR, false-positive rate; SVM, support vector machine; RF, random forest; Ada, AdaBoost; DT, decision tree; KNN, k-nearest neighbor; NB, naive Bayes.
Patients and corresponding clinical features.
| Development Cohort (1411) | Validation Cohort (941) | |||||
|---|---|---|---|---|---|---|
| Patient Characteristic | No Residual/ | Residual/ | No Residual/ | Residual/ | ||
| Recurrent CIN (1355) | Recurrent CIN (56) | Recurrent CIN (904) | Recurrent CIN (37) | |||
| Age (years) | ||||||
| <45 | 1004 | 44 | 0.552 | 659 | 26 | 0.87 |
| ≥45 | 351 | 12 | 245 | 11 | ||
| Pregnancy | ||||||
| <3 | 904 | 27 | 0.007 | 609 | 13 | <0.001 |
| ≥3 | 451 | 29 | 295 | 24 | ||
| Parity | ||||||
| <2 | 808 | 32 | 0.823 | 539 | 19 | 0.408 |
| ≥2 | 547 | 24 | 365 | 18 | ||
| Menopause | ||||||
| Yes | 1288 | 50 | 0.109 | 858 | 35 | 1 |
| No | 67 | 6 | 46 | 2 | ||
| TCT | ||||||
| <ASCUS | 307 | 11 | 0.544 | 226 | 14 | 0.222 |
| ≥ASCUS | 922 | 43 | 580 | 22 | ||
| Unknown | 126 | 2 | 98 | 1 | ||
| HPV16/18 or RLUs > 1000 | ||||||
| Yes | 625 | 38 | 0.024 | 341 | 13 | 0.477 |
| No | 512 | 15 | 421 | 22 | ||
| Unknown | 218 | 3 | 142 | 2 | ||
| Transformation zone | ||||||
| Type I/II | 632 | 29 | 0.492 | 409 | 22 | 0.882 |
| Type III | 102 | 7 | 75 | 5 | ||
| Unknown | 621 | 20 | 420 | 10 | ||
| ECC | ||||||
| Positive | 36 | 5 | 0.02 | 27 | 5 | 0.003 |
| Negative | 1319 | 51 | 877 | 32 | ||
| Improved | ||||||
| Yes | 565 | 14 | 0.019 | 369 | 14 | 0.849 |
| No | 790 | 42 | 535 | 23 | ||
| Severe | ||||||
| No | 1056 | 39 | 0.195 | 706 | 29 | 1 |
| Yes | 299 | 17 | 198 | 8 | ||
| First follow-up after conization | ||||||
| HR-HPV negative | 585 | 7 | <0.001 | 382 | 7 | <0.001 |
| HR-HPV positive | 166 | 36 | 103 | 25 | ||
| Unknown | 604 | 13 | 419 | 5 | ||
| TCT < ASCUS | 611 | 27 | <0.001 | 378 | 20 | <0.001 |
| TCT >= ASCUS | 16 | 13 | 6 | 9 | ||
| Unknown | 728 | 16 | 520 | 8 | ||
| Second follow-up after conization | ||||||
| HR-HPV negative | 384 | 4 | <0.001 | 242 | 5 | <0.001 |
| HR-HPV positive | 114 | 32 | 53 | 20 | ||
| Unknown | 857 | 20 | ||||
| TCT < ASCUS | 435 | 23 | <0.001 | 261 | 15 | <0.001 |
| TCT > ASCUS | 13 | 7 | 7 | 10 | ||
| Unknown | 907 | 26 | 636 | 12 | ||
ASCUS, atypical squamous cells of undetermined significance; CIN, cervical intraepithelial neoplasia; ECC, endocervical curettage; HR-HPV, high-risk human papillomavirus; RLUs, relative light units.
Figure 2ROC curves of the four models. (A) ROC curves of 5-fold cross validation of the development cohort; (B) ROC curves of the validation cohort.
Diagnostic performance of the development and validation cohorts.
| Threshold | AUC | Sensitivity | Specificity | FPR | FNR | Accuracy | |
|---|---|---|---|---|---|---|---|
| Development-Cohort Cross-Validation | |||||||
| Model A | 0.51 ± 0.12 | 0.58 ± 0.13 | 0.73 ± 0.24 | 0.58 ± 0.15 | 0.4 ± 0.15 | 0.2 ± 0.24 | 0.58 ± 0.13 |
| Model B | 0.65 ± 0.15 | 0.85 ± 0.07 | 0.88 ± 0.17 | 0.86 ± 0.09 | 0.14 ± 0.09 | 0.1 ± 0.17 | 0.86 ± 0.08 |
| Model C | 0.51 ± 0.24 | 0.89 ± 0.07 | 0.97 ± 0.07 | 0.80 ± 0.09 | 0.20 ± 0.09 | 0.0 ± 0.07 | 0.81 ± 0.08 |
| Model D | 0.59 ± 0.15 | 0.92 ± 0.01 | 0.94 ± 0.12 | 0.84 ± 0.06 | 0.1 ± 0.06 | 0.0 ± 0.12 | 0.85 ± 0.05 |
| Validation cohort | |||||||
| Model A | 0.53 | 0.69 (0.59–0.78) | 0.60 | 0.73 | 0.27 | 0.4 | 0.72 |
| Model B | 0.44 | 0.88 (0.80–0.95) | 0.83 | 0.80 | 0.20 | 0.17 | 0.81 |
| Model C | 0.43 | 0.89 (0.81–0.97) | 0.86 | 0.81 | 0.19 | 0.14 | 0.82 |
| Model D | 0.42 | 0.91 (0.87–0.96) | 0.94 | 0.77 | 0.23 | 0.06 | 0.78 |
AUC, area under the curve; FPR, false-positive rate; FNR, false-negative rate.
Figure 3The nomograms of four models: (A) Model A; (B) Model B; (C) Model C; (D) Model D.
Figure 4The calibration curves for the development cohort (A) and the validation cohort (B). The decision curves for the development cohort (C) and the validation cohort (D).
The predictive performance of different ML methods in the validation cohort.
| AUC | Sensitivity | Specificity | FPR | FNR | Accuracy | |
|---|---|---|---|---|---|---|
|
| ||||||
| SVM | 0.66 (0.56–0.76) | 0.51 | 0.80 | 0.20 | 0.49 | 0.79 |
| RF | 0.68 (0.58–0.77) | 0.57 | 0.74 | 0.26 | 0.43 | 0.73 |
| DT | 0.60 (0.51–0.68) | 0.69 | 0.52 | 0.48 | 0.31 | 0.53 |
| KNN | 0.51 (0.42–0.59) | 0.46 | 0.56 | 0.44 | 0.54 | 0.56 |
| NB | 0.62 (0.51–0.73) | 0.4 | 0.86 | 0.14 | 0.6 | 0.84 |
| Ada | 0.69 (0.59–0.78) | 0.6 | 0.73 | 0.27 | 0.4 | 0.73 |
|
| ||||||
| SVM | 0.85 (0.79–0.91) | 0.79 | 0.82 | 0.18 | 0.21 | 0.82 |
| RF | 0.83 (0.73–0.92) | 0.79 | 0.82 | 0.18 | 0.21 | 0.82 |
| DT | 0.82 (0.72–0.91) | 0.79 | 0.82 | 0.18 | 0.21 | 0.82 |
| KNN | 0.73 (0.62–0.83) | 0.62 | 0.76 | 0.24 | 0.38 | 0.75 |
| NB | 0.88 (0.81–0.94) | 0.83 | 0.80 | 0.21 | 0.17 | 0.80 |
| Ada | 0.88 (0.80–0.95) | 0.83 | 0.80 | 0.20 | 0.17 | 0.81 |
|
| ||||||
| SVM | 0.89 (0.81–0.96) | 0.86 | 0.80 | 0.20 | 0.14 | 0.81 |
| RF | 0.88 (0.80–0.95) | 0.86 | 0.82 | 0.18 | 0.14 | 0.83 |
| DT | 0.83 (0.75–0.92) | 0.86 | 0.80 | 0.20 | 0.14 | 0.81 |
| KNN | 0.81 (0.71–0.91) | 0.72 | 0.88 | 0.12 | 0.27 | 0.86 |
| NB | 0.87 (0.80–0.95) | 0.86 | 0.80 | 0.2 | 0.14 | 0.81 |
| Ada | 0.90 (0.82–0.97) | 0.86 | 0.81 | 0.19 | 0.14 | 0.82 |
|
| ||||||
| SVM | 0.91 (0.87–0.96) | 0.94 | 0.77 | 0.24 | 0.06 | 0.78 |
| RF | 0.92 (0.87–0.96) | 0.94 | 0.77 | 0.23 | 0.06 | 0.78 |
| DT | 0.90 (0.85–0.95) | 0.94 | 0.76 | 0.24 | 0.06 | 0.78 |
| KNN | 0.86 (0.79–0.93) | 0.8 | 0.86 | 0.14 | 0.2 | 0.86 |
| NB | 0.92 (0.88–0.97) | 0.94 | 0.77 | 0.23 | 0.06 | 0.78 |
| Ada | 0.91 (0.86–0.96) | 0.94 | 0.77 | 0.23 | 0.06 | 0.78 |
The p-value was calculated using the DeLong test by comparing the AUC values between the LR model and other ML models. AUC, area under the curve; SVM, support vector machines; RF random forests; DT, decision tree; KNN, k-nearest neighbor; NB, naive Bayes; Ada, AdaBoost.
Figure 5The ROCs of six ML methods. (A) ROCs of the development cohort; (B) ROCs of the validation cohort.
Figure 6KM survival curves of different models in the validation cohort. (A) Model A; (B) Model B; (C) Model C; (D) Model D.