| Literature DB >> 36187616 |
Yuan-Yuan Guo1, Zhi-Jie Li1, Chao Du2, Jun Gong3, Pu Liao1, Jia-Xing Zhang1, Cong Shao4.
Abstract
Thyroid tumors, one of the common tumors in the endocrine system, while the discrimination between benign and malignant thyroid tumors remains insufficient. The aim of this study is to construct a diagnostic model of benign and malignant thyroid tumors, in order to provide an emerging auxiliary diagnostic method for patients with thyroid tumors. The patients were selected from the Chongqing General Hospital (Chongqing, China) from July 2020 to September 2021. And peripheral blood, BRAFV600E gene, and demographic indicators were selected, including sex, age, BRAFV600E gene, lymphocyte count (Lymph#), neutrophil count (Neu#), neutrophil/lymphocyte ratio (NLR), platelet/lymphocyte ratio (PLR), red blood cell distribution width (RDW), platelets count (PLT), red blood cell distribution width-coefficient of variation (RDW-CV), alkaline phosphatase (ALP), and parathyroid hormone (PTH). First, feature selection was executed by univariate analysis combined with least absolute shrinkage and selection operator (LASSO) analysis. Afterward, we used machine learning algorithms to establish three types of models. The first model contains all predictors, the second model contains indicators after feature selection, and the third model contains patient peripheral blood indicators. The four machine learning algorithms include extreme gradient boosting (XGBoost), random forest (RF), light gradient boosting machine (LightGBM), and adaptive boosting (AdaBoost) which were used to build predictive models. A grid search algorithm was used to find the optimal parameters of the machine learning algorithms. A series of indicators, such as the area under the curve (AUC), were intended to determine the model performance. A total of 2,042 patients met the criteria and were enrolled in this study, and 12 variables were included. Sex, age, Lymph#, PLR, RDW, and BRAFV600E were identified as statistically significant indicators by univariate and LASSO analysis. Among the model we constructed, RF, XGBoost, LightGBM and AdaBoost with the AUC of 0.874 (95% CI, 0.841-0.906), 0.868 (95% CI, 0.834-0.901), 0.861 (95% CI, 0.826-0.895), and 0.837 (95% CI, 0.802-0.873) in the first model. With the AUC of 0.853 (95% CI, 0.818-0.888), 0.853 (95% CI, 0.818-0.889), 0.837 (95% CI, 0.800-0.873), and 0.832 (95% CI, 0.797-0.867) in the second model. With the AUC of 0.698 (95% CI, 0.651-0.745), 0.688 (95% CI, 0.639-0.736), 0.693 (95% CI, 0.645-0.741), and 0.666 (95% CI, 0.618-0.714) in the third model. Compared with the existing models, our study proposes a model incorporating novel biomarkers which could be a powerful and promising tool for predicting benign and malignant thyroid tumors.Entities:
Keywords: BRAFV600E gene mutation; machine learning; predictive model; risk-factors; thyroid tumor
Mesh:
Substances:
Year: 2022 PMID: 36187616 PMCID: PMC9515945 DOI: 10.3389/fpubh.2022.960740
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Clinical characteristics and variables of patients in all cohorts.
|
|
| ||
|---|---|---|---|
|
| |||
| Male | 105 (18.7) | 357 (24.1) | 0.011 |
| Female | 456 (81.3) | 1,124 (75.9) | |
|
| |||
| Mutation | 76 (13.5) | 1,170 (79.0) | <0.001 |
| Wild | 485 (86.5) | 311 (21.0) | |
| Age (years) | 45.00 [35.00, 52.00] | 39.00 [32.00, 50.00] | <0.001 |
| Lymph# ( ×109/L) | 1.64 [1.37, 2.01] | 1.58 [1.29, 1.94] | <0.001 |
| Neu# ( ×109/L) | 3.64 [2.85, 4.65] | 3.60 [2.84, 4.57] | 0.991 |
| NLR | 2.13 [1.69, 2.85] | 2.20 [1.70, 2.96] | 0.061 |
| PLR | 130.06 [103.38, 157.24] | 140.00 [110.36, 172.27] | <0.001 |
| RDW (%) | 42.30 [40.60, 43.90] | 41.90 [40.50, 43.40] | 0.002 |
| PLT ( ×109/L) | 215.00 [184.00, 251.00] | 222.00 [187.00, 260.00] | 0.061 |
| RDW-CV | 12.90 [12.50, 13.40] | 12.80 [12.50, 13.30] | 0.594 |
| ALP (U/L) | 67.00 [59.00, 78.14] | 67.00 [56.00, 81.00] | 0.395 |
| PTH (ng/ml) | 49.20 [43.90, 53.75] | 48.50 [37.80, 58.90] | 0.786 |
Figure 1Flowchart of research object.
Clinical characteristics and variables of patients in training cohort and test cohort.
|
|
|
| ||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
|
| ||||||
| Male | 70 (17.7) | 247 (23.9) | 0.015 | 35 (21.1) | 110 (24.6) | 0.421 |
| Female | 325 (82.3) | 787 (76.1) | 131 (78.9) | 337 (75.4) | ||
|
| ||||||
| Mutation | 55 (13.9) | 822 (79.5) | <0.001 | 21 (12.7) | 348 (77.9) | <0.001 |
| Wild | 340 (86.1) | 212 (20.5) | 145 (87.3) | 99 (22.1) | ||
| Age (years) | 45.00 [36.00, 52.00] | 39.00 [33.00, 50.00] | <0.001 | 44.00 [34.00, 52.00] | 38.00 [32.00, 49.00] | 0.008 |
| Lymph# ( ×109/L) | 1.64 [1.39, 2.00] | 1.56 [1.28, 1.92] | 0.001 | 1.65 [1.35, 2.05] | 1.61 [1.31, 1.96] | 0.189 |
| Neu# ( ×109/L) | 3.62 [2.83, 4.65] | 3.58 [2.83, 4.54] | 0.925 | 3.66 [2.93, 4.66] | 3.64 [2.88, 4.63] | 0.877 |
| NLR | 2.14 [1.69, 2.91] | 2.21 [1.71, 2.98] | 0.074 | 2.11 [1.70, 2.73] | 2.18 [1.70, 2.95] | 0.48 |
| PLR | 131.40 [103.99, 160.30] | 140.70 [110.93, 173.62] | <0.001 | 127.47 [101.35, 155.48] | 138.33 [109.73, 170.43] | 0.013 |
| RDW (%) | 42.40 [40.90, 44.00] | 41.90 [40.40, 43.48] | <0.001 | 41.95 [40.30, 43.58] | 41.90 [40.50, 43.20] | 0.816 |
| PLT ( ×109/L) | 215.00 [185.00, 253.00] | 221.00 [186.00, 259.00] | 0.222 | 215.00 [183.00, 248.75] | 225.00 [190.00, 261.00] | 0.121 |
| RDW-CV | 12.90 [12.50, 13.40] | 12.80 [12.50, 13.30] | 0.387 | 12.80 [12.40, 13.20] | 12.80 [12.50, 13.30] | 0.709 |
| ALP (U/L) | 67.00 [59.26, 78.28] | 66.80 [56.00, 80.89] | 0.23 | 66.44 [58.77, 78.00] | 68.00 [56.00, 82.00] | 0.791 |
| PTH (ng/ml) | 49.03 [43.50, 53.73] | 48.70 [37.80, 58.80] | 0.925 | 49.44 [44.19, 53.82] | 47.68 [37.90, 59.45] | 0.498 |
Figure 2LASSO analysis of indicators after univariate analysis.
The optimal parameters of the three models.
|
|
|
|
|---|---|---|
| The first model | RF | mtry = 1, ntree = 60, |
| XGBoost | max_depth = 3, eta = 0.6, | |
| LightGBM | nrounds = 20, min_data = 1, | |
| AdaBoost | mfinal = 170 | |
| The second model | RF | mtry = 6, ntree = 140, |
| XGBoost | max_depth = 4, eta = 0.3, | |
| LightGBM | nrounds = 10, min_data = 3, | |
| AdaBoost | mfinal = 20 | |
| The third model | RF | mtry = 1, ntree = 90, |
| XGBoost | max_depth = 6, eta = 0.7, | |
| LightGBM | nrounds = 10, min_data = 3, | |
| AdaBoost | mfinal = 5 |
Performance evaluation table of three models.
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| The first model | RF | 0.790 | 0.886 | 0.949 | 0.790 | 0.862 | 0.874 (0.841–0.906) |
| XGBoost | 0.790 | 0.873 | 0.944 | 0.790 | 0.860 | 0.868 (0.834–0.901) | |
| LightGBM | 0.734 | 0.892 | 0.948 | 0.734 | 0.827 | 0.861 (0.826–0.895) | |
| AdaBoost | 0.723 | 0.855 | 0.931 | 0.720 | 0.812 | 0.837 (0.802–0.873) | |
| The second model | RF | 0.781 | 0.873 | 0.943 | 0.781 | 0.854 | 0.853 (0.818–0.888) |
| XGBoost | 0.754 | 0.873 | 0.941 | 0.754 | 0.837 | 0.853 (0.818–0.889) | |
| LightGBM | 0.765 | 0.873 | 0.942 | 0.765 | 0.844 | 0.837 (0.800–0.873) | |
| AdaBoost | 0.779 | 0.880 | 0.946 | 0.779 | 0.854 | 0.832 (0.797–0.867) | |
| The third model | RF | 0.671 | 0.645 | 0.836 | 0.671 | 0.744 | 0.698 (0.651–0.745) |
| XGBoost | 0.781 | 0.548 | 0.823 | 0.781 | 0.801 | 0.688 (0.639–0.736) | |
| LightGBM | 0.624 | 0.705 | 0.849 | 0.626 | 0.721 | 0.693 (0.645–0.741) | |
| AdaBoost | 0.626 | 0.651 | 0.828 | 0.626 | 0.713 | 0.666 (0.618–0.714) |
Figure 3ROC curve of four models in different categories.
Figure 4Importance ranking of prediction indicators after feature selection.
Comparison of the newly created model with the existing model.
|
|
|
|
|
|
|---|---|---|---|---|
| Machine Learning for Identifying Benign and Malignant of Thyroid Tumors: A Retrospective Study of 2,423 Patients (final model) | Yuan-yuan Guo.et al | Machine learning (Random forest) | Sex, age, Lymph#, | 0.853 (95% CI, 0.818,0.888) |
| Deep learning-based artificial intelligence model to assist thyroid nodule diagnosis and management: a multicentre diagnostic study( | Sui, Peng. et al | Deep learning (ResNet, ResNeXt, DenseNet) | Ultrasound images | 0.875 (95% CI, 0.871–0.880) |
| Machine learning to identify lymph node metastasis from thyroid cancer in patients undergoing contrast-enhanced CT studies ( | Masuda et al | machine learning (Support Vector Machines) | CT images | 0.86 |
| Deep convolutional neural network for classification of thyroid nodules on ultrasound: Comparison of the diagnostic performance with that of radiologists ( | Yeonjae et al. | Deep learning | Images of underwent US-guided fine-needle aspiration biopsy | 0.83–0.86 |
| Deep convolutional neural network for the diagnosis of thyroid nodules on ultrasound ( | Yeon et al. | Deep learning (Convolutional Neural Network) | Ultrasound image | 0.845, 0.835, and 0.850 |
| A comparison between deep learning convolutional neural networks and radiologists in the differentiation of benign and malignant thyroid nodules on CT images ( | Hong-Bo Zhao et al. | Deep learning (Convolutional Neural Network) | CT images | 0.901–0.947 |