| Literature DB >> 33854961 |
Wei Yan1, Hua Shi1, Tao He2, Jian Chen2, Chen Wang2, Aijun Liao1, Wei Yang1, Huihan Wang1.
Abstract
OBJECTIVE: In order to enhance the detection rate of multiple myeloma and execute an early and more precise disease management, an artificial intelligence assistant diagnosis system is developed.Entities:
Keywords: artificial intelligence; early diagnosis; gradient boosting decision tree; machine learning; multiple myeloma
Year: 2021 PMID: 33854961 PMCID: PMC8039367 DOI: 10.3389/fonc.2021.608191
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Figure 1New sample generation with SMOTE algorithm. A new sample was generated using two existing samples, where the newly generated samples are denoted by ‘stars’. SMOTE, Synthetic Minority Oversampling Technique.
Figure 2The flowchart and the complete training pipeline of the GBDT model.
Subject characteristics.
| Variable | Multiple myeloma dataset | Control dataset |
|---|---|---|
| Mean (SD) | Mean (SD) | |
| Creatinine (umol/L) | 137.97 (4.62) | 119.85 (2.83) |
| Serum β2 microglobulin (mg/L) | 7.51 (0.30) | 6.12 (0.66) |
| Urine β2 microglobulin (mg/L) | 22.35 (1.24) | 16.75 (5.16) |
| IgA (g/L) | 4.43 (0.37) | 2.89 (0.04) |
| IgG (g/L) | 14.45 (0.48) | 12.11 (0.14) |
| IgM (g/L) | 0.67 (0.80) | 1.23 (0.02) |
| Albumin (g/L) | 35.91 (0.26) | 29.70 (0.30) |
| Total protein (g/L) | 68.60 (0.63) | 58.93 (0.46) |
| Serum calcium (mmol/L) | 2.20 (0.08) | 2.04 (0.00) |
| Hemoglobin (g/L) | 107.38 (0.70) | 114.59 (0.47) |
Results of Testing Group based on 9 variables.
| Method | Class | P | R | F1 |
|---|---|---|---|---|
| GBDT | Non-myeloma | 0.899 | 0.928 | 0.913 |
| Myeloma | 0.929 | 0.900 | 0.915 | |
| RF | Non-myeloma | 0.884 | 0.903 | 0.908 |
| Myeloma | 0.901 | 0.90 | 0.906 | |
| SVM | Non-myeloma | 0.830 | 0.827 | 0.829 |
| Myeloma | 0.836 | 0.839 | 0.837 | |
| DNN | Non-myeloma | 0.850 | 0.783 | 0.815 |
| Myeloma | 0.772 | 0.842 | 0.805 |
Figure 3The ROC comparison of four algorithms based on nine variables. The classifier with GBDT obtains an AUC of 0.975 [95% confidence interval (CI): 0.986–0.963], and has the best performance when comparing with the other three algorithms. ROC, Receiver Operating Characteristic; GBDT, Gradient Boosting Decision Tree; RF, Random Forest; DNN, Deep Neural Networks. Nine items are hemoglobin, serum creatinine, serum calcium, immunoglobulin (A, G and M), albumin, total protein, and ratio of albumin to globulin.