| Literature DB >> 32283530 |
Herdiantri Sufriyana1, Yu-Wei Wu2, Emily Chia-Yu Su3.
Abstract
BACKGROUND: We developed and validated an artificial intelligence (AI)-assisted prediction of preeclampsia applied to a nationwide health insurance dataset in Indonesia.Entities:
Keywords: Artificial intelligence; Clinical prediction rule; Health insurance dataset; Machine learning; Natural language processing; Preeclampsia
Year: 2020 PMID: 32283530 PMCID: PMC7152721 DOI: 10.1016/j.ebiom.2020.102710
Source DB: PubMed Journal: EBioMedicine ISSN: 2352-3964 Impact factor: 8.143
Diagnosis codes for nested case-control sampling.
| ICD10 codes | Description |
|---|---|
| O | Pregnancy, childbirth, and puerperium |
| O10–16 | Oedema, proteinuria, and hypertensive disorders in pregnancy, childbirth, and puerperium |
| O14–15 | Preeclampsia and eclampsia |
| O80–82 | Encounter for delivery |
| Z33–37 | Pregnant state, encounter for supervision of normal pregnancy, encounter for antenatal screening of mother, and outcome of delivery |
Fig. 1Dataset constructed for model development. The original dataset was constructed with a nested case-control design. Controls were sampled within the same age range of case groups (12–55 years old). NHID-BPJSKes, nationwide health insurance dataset of BPJS Kesehatan; PIH, pregnancy-induced hypertension; IV, internal validation; GEV, geographical split for external validation; TEV, temporal split for external validation.
Feature candidates selected by the multivariate logistic regression model with forward selection from original candidates and principal components.
| # | Feature | Cases ( | Controls ( | |
|---|---|---|---|---|
| 1 | Time-to-event (months) ± SD | 4.56 ± 5.19 | 4.16 ± 4.43 | 0.08 |
| 2 | Age (years) ± SD | 32 ± 12 | 30 ± 12 | <0.0001 |
| 3 | Family role, | |||
| . Wife | 1895 (62.05) | 10,953 (61.12) | – | |
| . Primary member | 849 (27.80) | 4381 (24.45) | 0.06 | |
| . Child | 214 (7.01) | 2161 (12.05) | 0.01 | |
| . Additional member | 96 (3.14) | 426 (2.38) | <0.0001 | |
| 4 | Member stratum, | |||
| . First | 459 (15.03) | 2494 (13.92) | <0.0001 | |
| . Second | 1306 (42.76) | 8114 (45.28) | 0.45 | |
| . Third | 1289 (42.21) | 7313 (40.80) | – | |
| 5 | Member type, | |||
| . Company-paid labor | 1517 (49.67) | 8720 (48.66) | <0.0001 | |
| . Government-paid labor | 769 (25.18) | 4997 (27.88) | <0.0001 | |
| . Self-paid labor | 747 (24.46) | 4173 (23.29) | – | |
| . Non-labor | 21 (0.69) | 31 (0.17) | <0.0001 | |
| 6 | A codes - Certain infectious and parasitic diseases, visits ± SD; | 2.72 ± 1.79; 248 (8.12) | 1.54 ± 1.06; 1412 (7.88) | 0.02 |
| 7 | E codes - Endocrine, nutritional and metabolic diseases, visits ± SD; | 5.00 ± 5.41; 187 (6.12) | 2.65 ± 2.33; 310 (1.73) | <0.0001 |
| 8 | I codes - Diseases of the circulatory system, visits ± SD; | 4.05 ± 3.75; 570 (18.66) | 2.63 ± 2.33; 609 (3.40) | <0.0001 |
| 9 | Immune-related codes, visits ± SD; | 2.97 ± 2.15; 308 (10.09) | 1.77 ± 1.39; 1142 (6.37) | <0.0001 |
| 10 | Eye-related codes, visits ± SD; | 2.81 ± 1.62; 57 (1.87) | 1.78 ± 1.10; 444 (2.48) | <0.0001 |
| 11 | N codes - Diseases of the genitourinary system, visits ± SD; | 3.94 ± 3.37; 172 (5.63) | 1.95 ± 1.99; 856 (4.78) | <0.0001 |
| 12 | Eye-related codes, visits ± SD; | 2.28 ± 1.37; 248 (8.12) | 1.71 ± 0.99; 1412 (7.88) | <0.0001 |
| 13 | Breast-related codes, visits ± SD; | 5.85 ± 2.58; 13 (0.43) | 1.00 ± 0.00; 6 (0.03) | <0.0001 |
| 14 | Digestive system-related codes, visits ± SD; | 2.48 ± 2.35; 186 (6.09) | 1.85 ± 1.60; 768 (4.29) | <0.0001 |
| 15 | Skin and subcutaneous-related codes, visits ± SD; | 1.81 ± 0.71; 36 (1.18) | 1.52 ± 1.14; 287 (1.60) | <0.0001 |
| 16 | Principal components 8 (see | 2.72 ± 1.79; 248 (8.12) | 1.54 ± 1.06; 1412 (7.88) | <0.0001 |
| 17 | Principal components 10 (see | −0.09 ± 0.03 | 0.09 ± 0.01 | <0.0001 |
Forced into the multivariate logistic regression model.
Comparator.
Non-zero visits.
Calibration and discrimination tests of six machine learning models by both internal and external validations.
| Validation | Algorithm | Calibration | Discrimination tests | ||
|---|---|---|---|---|---|
| Slope (95% CI) | Intercept (95% CI) | AUROC (95% CI) | Prec. (95% CI) | ||
| Internal | LR | 1.08 (1.08, 1.09) | −0.04 (−0.04, −0.03) | 0.70 (0.69, 0.70) | 0.78 (0.78, 0.78) |
| DT | 0.99 (0.99, 1.00) | 0.01 (0.01, 0.01) | 0.66 (0.66, 0.67) | 0.73 (0.72, 0.74) | |
| ANN | 0.64 (0.63, 0.64) | 0.14 (0.14, 0.15) | 0.65 (0.64, 0.67) | 0.74 (0.73, 0.75) | |
| RF | 1.54 (1.54, 1.54) | −0.27 (−0.27, −0.26) | 0.86 (0.85, 0.86) | 0.86 (0.85, 0.86) | |
| SVM | 2.68 (2.66, 2.70) | −0.89 (−0.90, −0.88) | 0.68 (0.67, 0.68) | 0.78 (0.76, 0.79) | |
| Ens. | 1.21 (1.21, 1.22) | −0.13 (−0.13, −0.12) | 0.70 (0.70, 0.71) | 0.78 (0.77, 0.78) | |
| External, geographical split | LR | 1.80 (1.76, 1.83) | −0.34 (−0.35, −0.32) | 0.74 (0.73, 0.76) | 0.68 (0.67, 0.70) |
| DT | 0.69 (0.67, 0.71) | 0.15 (0.14, 0.16) | 0.60 (0.59, 0.61) | 0.80 (0.79, 0.81) | |
| ANN | 0.75 (0.73, 0.77) | 0.08 (0.07, 0.09) | 0.67 (0.64, 0.70) | 0.55 (0.52, 0.58) | |
| RF | 1.47 (1.45, 1.50) | −0.19 (−0.21, −0.18) | 0.76 (0.76, 0.77) | 0.82 (0.81, 0.83) | |
| SVM | 3.12 (3.02, 3.21) | −1.07 (−1.12, −1.02) | 0.62 (0.61, 0.62) | 0.54 (0.52, 0.57) | |
| Ens. | 1.52 (1.49, 1.55) | −0.28 (−0.30, −0.26) | 0.72 (0.71, 0.73) | 0.70 (0.68, 0.72) | |
| External, temporal split | LR | 0.74 (0.72, 0.76) | 0.16 (0.15, 0.17) | 0.62 (0.62, 0.63) | 0.77 (0.76, 0.77) |
| DT | 0.92 (0.90, 0.93) | 0.08 (0.08, 0.09) | 0.63 (0.62, 0.63) | 0.69 (0.68, 0.70) | |
| ANN | 0.30 (0.29, 0.31) | 0.34 (0.33, 0.35) | 0.58 (0.58, 0.59) | 0.71 (0.70, 0.72) | |
| RF | 1.09 (1.08, 1.11) | 0.02 (0.02, 0.03) | 0.70 (0.70, 0.70) | 0.78 (0.78, 0.79) | |
| SVM | 2.25 (2.20, 2.30) | −0.65 (−0.67, −0.62) | 0.63 (0.63, 0.63) | 0.72 (0.71, 0.73) | |
| Ens. | 0.74 (0.72, 0.76) | 0.15 (0.14, 0.16) | 0.61 (0.61, 0.62) | 0.74 (0.73, 0.74) | |
AUROC, area under the receiver operating characteristic curve; LR, machine learning-optimized logistic regression; DT, decision tree; ANN, artificial neural network; RF, random forest; SVM, support vector machine; Ens., ensemble algorithm.
For a specificity of ∼90%.
Fig. 2Receiver operating characteristics (ROC) curves for the random forest model. Four panels show the ROC curves with AUROCs and 95% CIs using these datasets: (a) training set; (b) internal validation set; (c) external validation set by geographical split; and (d) external validation set by temporal split. The dashed line is a reference line. AUROC, area under the receiver operating characteristics curve.
Fig. 3Area under receiver operating characteristics curve (AUROC) of subgroups by the time-to-event from the random forest model. Four panels show the AUROCs using these datasets: (a) training set; (b) internal validation set; (c) external validation set by geographical split; and (d) external validation set by temporal split. The error bar and 95% confidence interval are shown. To improve readability, the y-axis scale was begun from 0•45; all of the data are completely shown. The dashed line shows the minimum AUROC among those using training and IV sets. AUROC, area under the receiver operating characteristics curve.
Text profile for ICD10 codes of diagnosis predictors in the true-predicted case group by the random forest model.
| Time-to-event | Diagnosis predictor | ICD10 codes and description |
|---|---|---|
| Diagnoses within the last 2 years to the event (partially censored) | A codes - Certain infectious and parasitic diseases | A010 (Typhoid fever) |
| A09 (Infectious gastroenteritis and colitis, unspecified) | ||
| A182 (Tuberculous peripheral lymphadenopathy) | ||
| A231 (Brucellosis due to | ||
| A78 (Q fever) | ||
| A91 (Dengue haemorrhagic fever) | ||
| E codes - Endocrine, nutritional, and metabolic diseases | E059 (Thyrotoxicosis, unspecified) | |
| E118 (Type 2 diabetes mellitus with unspecified complications) | ||
| E119 (Type 2 diabetes mellitus without complications) | ||
| E780 (Pure hypercholesterolemia) | ||
| E785 (Hyperlipidaemia, unspecified) | ||
| E86 (Volume depletion) | ||
| I codes - Diseases of the circulatory system | I10 (Essential [primary] hypertension) | |
| I159 (Secondary hypertension, unspecified) | ||
| I500 (Congestive heart failure) | ||
| Immune-related codes | J304 (Allergic rhinitis, unspecified) | |
| J329 (Chronic sinusitis, unspecified) | ||
| J459 (Asthma, unspecified) | ||
| L208 (Other atopic dermatitis) | ||
| L209 (Atopic dermatitis, unspecified) | ||
| M154 (Erosive [osteo]arthrosis) | ||
| Eye-related codes | H000 (Hordeolum and other deep inflammation of eyelid) | |
| H055 (Retained [old] foreign body following penetrating wound of orbit) | ||
| H109 (Conjunctivitis, unspecified) | ||
| H521 (Myopia) | ||
| H527 (Disorder of refraction, unspecified) | ||
| Diagnoses within the last year to the event | Diseases of the genitourinary system | N300 (Acute cystitis) |
| N309 (Cystitis, unspecified) | ||
| N601 (Diffuse cystic mastopathy) | ||
| N608 (Other benign mammary dysplasias) | ||
| N609 (Benign mammary dysplasia, unspecified) | ||
| N61 (Inflammatory disorders of breast) | ||
| Eye-related codes | H000 (Hordeolum and other deep inflammation of eyelid) | |
| H055 (Retained [old] foreign body following penetrating wound of orbit) | ||
| H109 (Conjunctivitis, unspecified) | ||
| H521 (Myopia) | ||
| H527 (Disorder of refraction, unspecified) | ||
| Diagnoses within the pregnancy period to the event | Breast-related codes | N61 (Inflammatory disorders of breast) |
| Digestive system-related codes | A09 (Infectious gastroenteritis and colitis, unspecified) | |
| K029 (Dental caries, unspecified) | ||
| K040 (Pulpitis) | ||
| K045 (Chronic apical periodontitis) | ||
| K047 (Periapical abscess without sinus) | ||
| K053 (Chronic periodontitis) | ||
| K30 (Excessive attrition of teeth) | ||
| Skin and subcutaneous-related codes | L209 (Atopic dermatitis, unspecified) | |
| Principal components | Principal components 8 | H000 (Hordeolum and other deep inflammation of eyelid) |
| H109 (Conjunctivitis, unspecified) | ||
| H521 (Myopia) | ||
| H527 (Disorder of refraction, unspecified) | ||
| H608 (Other otitis externa) | ||
| H609 (Otitis externa, unspecified) | ||
| H811 (Benign paroxysmal vertigo) | ||
| H814 (Vertigo of central origin) | ||
| Principal components 10 | D509 (Iron deficiency anemia, unspecified) | |
| D648 (Other specified anaemias) | ||
| D649 (Anemia, unspecified) |
Predictive performances of the random forest model in the subgroup of 9–12 months to the event with cut off value at either similar sensitivity or specificity based on internal validation compared to those from previous studies.
| Algorithm | Validation | AUROC (95% CI) | Prec. (95% CI) | Sens. (95% CI) | Spec. (95% CI) |
|---|---|---|---|---|---|
| RF 9–<12 mo.; cut off value of 0.34 | 10-fold CV | 0.90 (0.88, 0.91) | 0.71 (0.68, 0.73) | 0.98 (0.97, 0.99) | 0.52 (0.49, 0.55) |
| MacDonald-Wallis et al. (2015) | Bootstrapping | 0.88 (0.86, 0.90) | 0.04 (0.03, 0.04) | 0.95 | 0.37 (0.31, 0.42) |
| RF 9–<12 mo.; cut off value of 0.54 | 10-fold CV | 0.90 (0.88, 0.91) | 0.88 (0.87, 0.90) | 0.70 (0.67, 0.73) | 0.89 (0.87, 0.91) |
| Guy et al. (2017) | No IV | 0.80 (0.75, 0.85) | 0.09 (0.07, 0.12) | 0.41 (0.29, 0.54) | 0.90 |
| Viguiliouk et al. (2017) | No IV | 0.76 (0.72, 0.81) | NA | NA | NA |
| Wright et al. (2015) | 5-fold CV | 0.76 | 0.08 | 0.40 (0.39, 0.42) | 0.89 |
| Rocha et al. (2017) | No IV | 0.75 (0.72, 0.79) | 0.18 | 0.44 | 0.90 |
| RF 9–<12 mo.; cut off value of 0.34 | Bootstrapped GEV | 0.88 (0.88, 0.89) | 0.59 (0.58, 0.60) | 1.00 (1.00, 1.00) | 0.47 (0.45, 0.49) |
| MacDonald-Wallis et al. (2015) | Bootstrapping | 0.88 (0.84, 0.93) | 0.05 (0.04, 0.06) | 0.95 | 0.47 (0.40, 0.55) |
| RF 9–<12 mo.; cut off value of 0.34 | Bootstrapped TEV | 0.86 (0.85, 0.86) | 0.72 (0.72, 0.72) | 0.90 (0.90, 0.90) | 0.44 (0.43, 0.45) |
| ACOG (2017) | Bootstrapping | 0.57 (0.54, 0.61) | 0.17 | 0.87 | 0.27 |
| RF 9–<12 mo.; cut off value of 0.54 | Bootstrapped GEV | 0.88 (0.88, 0.89) | 0.82 (0.80, 0.85) | 0.52 (0.52, 0.52) | 0.91 (0.90, 0.93) |
| RF 9–<12 mo.; cut off value of 0.54 | Bootstrapped TEV | 0.86 (0.85, 0.86) | 0.89 (0.89, 0.89) | 0.70 (0.70, 0.70) | 0.86 (0.86, 0.86) |
| NICE (2015) | Bootstrapping | 0.76 | 0.07 | 0.39 (0.33, 0.37) | 0.89 |
| NICE (2017) | Bootstrapping | 0.61 (0.58, 0.65) | 0.09 | 0.38 | 0.85 |
AUROC, area under the receiver operating characteristic curve; Prec., precision; Sens., sensitivity; Spec., specificity; RF, random forest; NA, not available; NICE, National Institute for Health and Care Excellence; ACOG, American College of Obstetrics and Gynaecology.
Fixed specificity.
Interval estimate was not reported.