| Literature DB >> 35191613 |
Wenfei Liu1, Shoufei Wang1, Ziheng Ye1, Peipei Xu1, Xiaotian Xia1, Minggao Guo1.
Abstract
PURPOSE: Lung metastasis (LM) is one of the most frequent distant metastases of thyroid cancer (TC). This study aimed to develop a machine learning algorithm model to predict lung metastasis of thyroid cancer for providing relative information in clinical decision-making.Entities:
Keywords: lung metastasis; machine learning; partial dependency plot; prediction; thyroid cancer
Mesh:
Year: 2022 PMID: 35191613 PMCID: PMC9189456 DOI: 10.1002/cam4.4617
Source DB: PubMed Journal: Cancer Med ISSN: 2045-7634 Impact factor: 4.711
FIGURE 1Research flow chart
FIGURE 2Detailed screening process of data collection
FIGURE 3Target variable distribution of original data (A), under‐sampling data (B), and over‐sampling data (C)
FIGURE 4Correlation heatmaps of patients' characteristics features in original data (A), under‐sampling data (B), and over‐sampling (C)
Detailed information about the packages used in the development of machine learning models
| Package names | Version | Description |
|---|---|---|
| Numpy | 1.19.5 | Numpy is the fundamental package for array computing with python |
| Pandas | 1.0.4 | Powerful data structures for data analysis, time series, and statistics |
| Matplotlib | 3.3.2 | Python plotting package |
| Sklearn | 0.0 | A set of python modules for machine learning and data mining |
| XGBoost | 1.2.0 | XGBoost python package |
| Imblearn | 0.0 | Toolbox for imbalanced dataset in machine learning |
| PDPbox | 0.2.1 | Python partial dependence plot toolbox |
The detailed demographic information of the patients with thyroid cancer
| Categories | NLM [ | LM [ |
|
|---|---|---|---|
|
| 9738 (97.9) | 212 (2.1) |
|
| Year of diagnosis | 0.894 | ||
| 2010–2012 | 5122 (52.6) | 113 (53.3) | |
| 2013–2015 | 4616 (47.4) | 99 (46.7) | |
|
| 46.87 ± 15.54 | 64.52 ± 14.73 | <0.001 |
| Sex | <0.001 | ||
| Male | 2438 (25.0) | 92 (43,4) | |
| Female | 7300 (75.0) | 120 (56.6) | |
| Race | 0.149 | ||
| White | 7765 (79.7) | 158 (74.5) | |
| Black | 651 (6.7) | 16 (7.5) | |
| Others | 1322 (13.6) | 38 (17.9) | |
| Year of diagnosis | 0.894 |
Abbreviations: LM, lung metastasis; NLM, none lung metastasis.
Mean values ± Standard Deviation.
The detailed pathological characteristics of the patients with thyroid cancer
| Categories | NLM [ | LM [ |
|
|---|---|---|---|
|
| 9738 (97.9) | 212 (2.1) | |
| Laterality | |||
| Solitary | 5929 (60.9) | 138 (65.1) | 0.241 |
| Multifocal | 3809 (39.1) | 74 (34.9) | |
| Grade | |||
| Grade I | 7769 (79.8) | 35 (16.5) | <0.001 |
| Grade II | 1395 (14.3) | 15 (7.1) | |
| Grade III | 322 (3.3) | 34 (16.0) | |
| Grade IV | 252 (2.6) | 128 (60.4) | |
| T stage | |||
| T1a | 3014 (31.0) | 2 (0.9) | <0.001 |
| T1b | 2286 (23.5) | 7 (3.3) | |
| T2 | 1695 (17.4) | 8 (3.8) | |
| T3 | 2186 (22.4) | 32 (15.1) | |
| T4a | 306 (3.1) | 43 (20.3) | |
| T4b | 251 (2.6) | 120 (56.6) | |
| N stage | |||
| N0 | 6819 (70.0) | 66 (31.1) | <0.001 |
| N1a | 1677 (17.2) | 32 (15.1) | |
| N1b | 1242 (12.8) | 114 (53.8) | |
| Histological type | |||
| PTC | 8474 (87.0) | 68 (32.1) | <0.001 |
| FTC | 975 (10.0) | 33 (15.6) | |
| MTC | 99 (1.0) | 5 (2.4) | |
| ATC | 190 (2.0) | 106 (50.0) |
Abbreviations: ATC, anaplastic thyroid cancer; FTC, follicular thyroid cancer; MTC, medullary thyroid cancer; NLM, none lung metastasis; PTC, papillary thyroid cancer.
Univariate analysis of variables related to central lung metastasis (LM)
| Variables | OR | 95%CI |
|
|---|---|---|---|
| Year of diagnosis | |||
| 2010–2012 | Reference | ||
| 2013–2015 | 1.029 | 0.783–1.351 | 0.8391 |
|
| 1.076 | 1.066–1.086 | <0.001 |
| Sex | |||
| Male | 2.296 | 1.743–3.024 | <0.001 |
| Female | Reference | ||
| Laterality | |||
| Solitary | 1.198 | 0.901–1.594 | 0.2145 |
| Multifocal | Reference | ||
| Race | |||
| White | 1.17 | 0.647–2.113 | 0.6039 |
| Others | 0.828 | 0.492–1.393 | 0.4769 |
| Black | Reference | ||
| Grade | |||
| Grade I | Reference | ||
| Grade II | 2.387 | 1.3–4.382 | 0.005 |
| Grade III | 23.438 | 14.431–38.065 | <0.001 |
| Grade IV | 112.747 | 76.005–167.252 | <0.001 |
| T stage | |||
| T1a | Reference | ||
| T1b | 4.615 | 0.959–22.21 | 0.0565 |
| T2 | 7.113 | 1.51–33.495 | 0.0131 |
| T3 | 22.06 | 5.288–92.037 | <0.001 |
| T4a | 211.768 | 51.114–877.355 | <0.001 |
| T4b | 720.477 | 177.296–2927.8 | <0.001 |
| N stage | |||
| N0 | Reference | ||
| N1a | 1.971 | 1.288–3.017 | 0.0018 |
| N1b | 9.483 | 6.962–12.918 | <0.001 |
| Histological type | |||
| PTC | 0.061 | 0.04–0.092 | <0.001 |
| FTC | 0.072 | 0.026–0.202 | <0.001 |
| MTC | 0.015 | 0.01–0.02 | <0.001 |
| ATC | Reference | Reference | Reference |
Abbreviations: ATC, anaplastic thyroid cancer; CI, confidence interval; FTC, Follicular thyroid cancer; LM, lung metastasis; MTC, medullary Thyroid Cancer; NLM, none lung metastasis; OR, odds ratio; PTC, papillary thyroid cancer.
Mean continuous variable.
Multivariate analysis of variables related to lung metastasis (LM)
| Factors | OR | 95% CI |
|
|---|---|---|---|
|
| 1.027 | 1.015–1.038 | <0.001 |
| Sex | |||
| Male | 1.214 | 0.871–1.692 | 0.2514 |
| Female | Reference | ||
| Grade | |||
| Grade I | Reference | ||
| Grade II | 1.48 | 0.792–2.766 | 0.2185 |
| Grade III | 4.523 | 2.49–8.214 | <0.001 |
| Grade IV | 5.797 | 2.691–12.488 | <0.001 |
| T stage | |||
| T1a | Reference | ||
| T1b | 3.865 | 0.8–18.677 | 0.0925 |
| T2 | 4.076 | 0.85–19.54 | 0.0789 |
| T3 | 8.459 | 1.974–36.242 | 0.004 |
| T4a | 28.037 | 6.305–124.668 | <0.001 |
| T4b | 41.528 | 9.052–190.527 | <0.001 |
| N stage | |||
| N0 | Reference | ||
| N1a | 1.846 | 1.12–3.043 | 0.0163 |
| N1b | 3.95 | 2.66–5.865 | <0.001 |
| Histological type | |||
| PTC | 2.306 | 1.108–4.8 | 0.0254 |
| FTC | 0.492 | 0.147–1.645 | 0.2495 |
| MTC | 0.681 | 0.378–1.227 | 0.2011 |
| ATC | Reference |
Abbreviations: ATC, anaplastic thyroid cancer; CI, confidence interval; FTC, follicular thyroid cancer; LM, lung metastasis; MTC, medullary thyroid cancer; NLM, none lung metastasis; OR, odds ratio; PTC, papillary thyroid cancer.
Mean continuous variable.
FIGURE 5Learning curves of models with under‐sampling data (A) and over‐sampling (B)
FIGURE 6ROC curves of models developed by over‐sampling (A); ROC curves of models developed by under‐sampling (B); PR curves of models developed by over‐sampling (C); PR curves of models developed by under‐sampling (D); calibration curves of models developed by over‐sampling (E); calibration curves of models developed by under‐sampling (F)
FIGURE 7Importance ranking of feature variables
FIGURE 8Actual risk of LM related to clinical characteristics (A) and the partial dependent plots of clinical characteristics (B) shaded part represents the confidence interval