| Literature DB >> 34722308 |
Wentai Zhang1, Dongfang Li2, Ming Feng1, Baotian Hu2, Yanghua Fan3, Qingcai Chen2,4, Renzhi Wang1.
Abstract
BACKGROUND: No existing machine learning (ML)-based models use free text from electronic medical records (EMR) as input to predict immediate remission (IR) of Cushing's disease (CD) after transsphenoidal surgery.Entities:
Keywords: Cushing’s disease; immediate remission; machine learning; natural language processing; preoperative prediction
Year: 2021 PMID: 34722308 PMCID: PMC8548651 DOI: 10.3389/fonc.2021.754882
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Participants’ characteristics in trainning and test datasets.
| Characteristic | Total | Training dataset | Test dataset |
|
|---|---|---|---|---|
|
| 419 | 335 | 84 | 0.991 |
| Male | 80 (19.09) | 64 (19.10) | 16 (19.05) | |
| Female | 339 (80.91) | 271 (80.90) | 68 (80.95) | |
|
| 37.86 ± 13.06 | 37.94 ± 13.04 | 37.55 ± 13.06 | 0.808 |
|
| 0.492 | |||
| Yes | 359 (85.68) | 289 (86.27) | 70 (83.33) | |
| No | 60 (14.32) | 46 (13.73) | 14 (16.67) | |
|
| 0.684 | |||
| Macroadenoma | 40 (9.55) | 31 (9.25) | 9 (10.71) | |
| Microadenoma | 379 (90.45) | 304 (90.75) | 75 (89.29) | |
|
| 0.098 | |||
| Invasion | 26 (6.21) | 19 (5.67) | 9 (10.71) | |
| Non-invasion | 393 (93.79) | 316 (94.33) | 75 (89.29) | |
|
| 0.700 | |||
| Infiltrated | 45 (10.74) | 35 (10.45) | 10 (11.90) | |
| Normal | 374 (89.26) | 300 (89.55) | 74 (88.10) | |
|
| 36 (18–72) | 36 (18–72) | 43.5 (24–84) | 0.121 |
|
| 26.14 (24.04–28.93) | 26.15 (24.12–29.03) | 26.13 (23.51–28.21) | 0.476 |
|
| 426.40 (268.65–716.40) | 441.80 (270.63–745.66) | 411.68 (243.58–586.41) | 0.139 |
|
| 27.15 (22.39–33.01) | 27.02 (22.48–32.49) | 27.89 (21.59–33.67) | 0.715 |
|
| 72.6 (49.6–105) | 72.9 (50.7–106.5) | 66.3 (44.23–95.75) | 0.446 |
Patients’ characteristics in remission and non-remission groups.
| Characteristic | Remission | Non-remission |
|
|---|---|---|---|
|
| 317 | 102 | 0.195 |
| Male | 65 (20.50) | 15 (14.71) | |
| Female | 252 (79.50) | 87 (85.29) | |
|
| 37.74 ± 13.01 | 38.25 ± 13.25 | 0.730 |
|
|
| ||
| Yes | 287 (90.54) | 72 (70.59) | |
| No | 30 (9.46) | 30 (29.41) | |
|
| 0.206 | ||
| Macroadenoma | 27 (8.52) | 13 (12.75) | |
| Microadenoma | 290 (91.48) | 89 (87.25) | |
|
|
| ||
| Invasion | 14 (4.42) | 12 (11.76) | |
| Non-invasion | 303 (95.58) | 90 (88.24) | |
|
|
| ||
| Infiltrated | 27 (8.52) | 18 (17.65) | |
| Normal | 290 (91.48) | 84 (82.35) | |
|
| 36 (18–72) | 48 (20.25–84) | 0.256 |
|
| 26 (24.03–28.87) | 26.56 (24.30–29.27) | 0.448 |
|
| 412.56 (266.7–675.24) | 452.22 (283.92–821.78) | 0.337 |
|
| 27.5 (22.03–32.61) | 26.72 (23.03–33.87) | 0.954 |
|
| 68.3 (45.3–104) | 86.45 (55.40–114.50) |
|
Bold values in this table represent statistical significance (P < 0.05).
Logistic univariate and multivariate analysis of the relationship between risk factors and IR.
| Characteristics | Univariate analysis | Multivariate analysis | ||||
|---|---|---|---|---|---|---|
| OR | 95% CI |
| OR | 95% CI |
| |
|
| 1.496 | 0.811–2.759 | 0.197 | |||
|
| 0.997 | 0.980–1.014 | 0.726 | |||
|
| 3.986 | 2.258–7.036 |
| 3.641 | 1.996–6.641 |
|
|
| 0.637 | 0.316–1.687 | 0.209 | |||
|
| 0.347 | 0.155–0.776 |
| 0.413 | 0.17.–0.985 |
|
|
| 0.434 | 0.228–0.827 |
| 0.818 | 0.393–1.703 | 0.591 |
|
| 0.998 | 0.994–1.002 | 0.436 | |||
|
| 0.970 | 0.923–1.021 | 0.246 | |||
|
| 1.000 | 1.000–1.000 | 0.166 | |||
|
| 1.001 | 0.980–1.023 | 0.898 | |||
|
| 0.994 | 0.990–0.999 |
| 0.995 | 0.991–0.999 |
|
Bold values in this table represent statistical significance (P < 0.05).
Figure 1AUC values of four models with different numbers of structured features selected. The highest AUC value appeared when MLP with 11 variables came into use (AUC = 0.759).
Figure 2Performances of models with optimal number of structured features. MLP performed the best.
AUC values and 95 confidence interval of different models with different features.
| MLP | SVM | RF | LR | |
|---|---|---|---|---|
|
|
|
| 0.678 | 0.699 |
|
| 0.729 | 0.661 |
| 0.777 |
|
| 0.670 | 0.652 | 0.642 | 0.737 |
|
| 0.606 | 0.610 | 0.556 | 0.577 |
|
| 0.506 | 0.692 | 0.533 | 0.573 |
|
| 0.619 | 0.718 | 0.468 | 0.674 |
|
| 0.473 | 0.614 | 0.613 | 0.556 |
|
| 0.656 | 0.625 | 0.602 | 0.679 |
|
| 0.571 | 0.575 | 0.474 | 0.676 |
|
| 0.582 | 0.628 | 0.505 | 0.622 |
|
| 0.723 | 0.682 |
|
|
|
| 0.669 | 0.722 | 0.442 | 0.678 |
|
| 0.737 | 0.691 | 0.499 | 0.678 |
|
| 0.680 | 0.669 | 0.429 | 0.721 |
Bold values in this table represent highest AUC value in each ML based model.
Example of Occlusion Test Results.
| Original HPI | HPI after Deletion of Symptomatic Entities by CMeKG | |
|---|---|---|
|
| 患者诉于2年前无明显诱因出现双下肢水肿,乏力,无皮肤菲薄、紫纹。 | 患者诉于2年前无明显诱因出现, 无皮肤菲薄、紫纹。 |
|
| 0.737 | 0.629 |