| Literature DB >> 34785709 |
Jae Yoon Na1, Dongkyun Kim2, Amy M Kwon3, Jin Yong Jeon4, Hyuck Kim5, Chang-Ryul Kim1, Hyun Ju Lee1, Joohyun Lee6, Hyun-Kyung Park7.
Abstract
Despite the many comorbidities and high mortality rate in preterm infants with patent ductus arteriosus (PDA), therapeutic strategies vary depending on the clinical setting, and most studies of the related risk factors are based on small sample populations. We aimed to compare the performance of artificial intelligence (AI) analysis with that of conventional analysis to identify risk factors associated with symptomatic PDA (sPDA) in very low birth weight infants. This nationwide cohort study included 8369 very low birth weight (VLBW) infants. The participants were divided into an sPDA group and an asymptomatic PDA or spontaneously close PDA (nPDA) group. The sPDA group was further divided into treated and untreated subgroups. A total of 47 perinatal risk factors were collected and analyzed. Multiple logistic regression was used as a standard analytic tool, and five AI algorithms were used to identify the factors associated with sPDA. Combining a large database of risk factors from nationwide registries and AI techniques achieved higher accuracy and better performance of the PDA prediction tasks, and the ensemble methods showed the best performances.Entities:
Mesh:
Year: 2021 PMID: 34785709 PMCID: PMC8595677 DOI: 10.1038/s41598-021-01640-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The study population was identified using a subsequent flowchart of the study. VLBW infants, very low birth weight infants; KNN, Korean Neonatal Network; PDA, patent ductus arteriosus.
Demographic Characteristics of the Study Population (N = 8369).
| Characteristic | N (%) | Mean ± SD |
|---|---|---|
| 29.1 ± 2.9 | ||
| < 26 | 1258 (15.0) | |
| 26–29 | 2870 (34.3) | |
| 30–33 | 2381 (28.5) | |
| 34–37 | 530 (6.3) | |
| ≥ 37 | 1330 (15.9) | |
| 1105.1 ± 276.6 | ||
| < 500 | 131 (1.6) | |
| 500–999 g | 2736 (32.7) | |
| 1000–1500 g | 5502 (65.7) | |
| Birth height (cm) | 36.7 ± 3.6 | |
| Birth head circumference (cm) | 26.1 ± 2.4 | |
| Male sex | 4232 (50.6) | |
| Multiple births (≥ 2) | 2935 (35.1) | |
| Cesarean section | 1798 (21.5) | |
| Grouping by PDA status | ||
| 2982 (35.6) | ||
| With any treatmenta (sPDA_tx) | 2465 (82.7) | |
| Without treatment (sPDA_nontx) | 517 (17.3) | |
| Asymptomatic PDA or spontaneously closed PDA (nPDA) | 5387 (64.4) |
SD, standard deviation; PDA, patent ductus arteriosus.
a Treatments for PDA included medications, such as indomethacin and ibuprofen, as well as ligation surgery.
Performance Metrics of the Algorithms for Predicting sPDA and sPDA_tx, mean values (95% CI).
| sPDA | sPDA_tx | |||||||
|---|---|---|---|---|---|---|---|---|
| Accuracy | AUC | Sensitivity | Specificity | Accuracy | AUC | Sensitivity | Specificity | |
| 0.76 (0.74–0.78) | 0.81 (0.79–0.83) | 0.85 (0.83–0.87) | 0.60 (0.58–0.62) | 0.85 (0.82–0.87) | 0.78 (0.74–0.81) | 0.85 (0.28–0.32) | 0.98 (0.97–0.99) | |
| 0.76 (0.74–0.78) | 0.81 (0.79–0.84) | 0.64 (0.60–0.68) | 0.83 (0.81–0.85) | 0.97 (0.96–0.99) | 0.36 (0.28–0.45) | |||
| 0.65 (0.61–0.69) | 0.84 (0.81–0.86) | 0.85 (0.82–0.87) | 0.80 (0.76–0.85) | 0.93 (0.90–0.95) | 0.34 (0.26–0.41) | |||
| 0.75 (0.73–0.77) | 0.81 (0.79–0.83) | 0.75 (0.72–0.78) | 0.74 (0.72–0.77) | 0.77 (0.73–0.80) | 0.72 (0.66–0.77) | 0.83 (0.80–0.86) | 0.52 (0.44–0.61) | |
| 0.75 (0.73–0.78) | 0.81 (0.79–0.84) | 0.76 (0.73–0.79) | 0.75 (0.73–0.78) | 0.77 (0.74–0.81) | 0.77 (0.72–0.82) | 0.82 (0.79–0.86) | 0.57 (0.48–0.66) | |
| 0.66 (0.64–0.69) | 0.74 (0.72–0.77) | 0.73 (0.70–0.76) | 0.63 (0.60–0.66) | 0.67 (0.63–0.71) | 0.67 (0.61–0.72) | 0.71 (0.67–0.75) | 0.49 (0.40–0.58) | |
sPDA, symptomatic patent ductus arteriosus; sPDA_tx, symptomatic patent ductus arteriosus with any treatment; CI, confidence interval; AUC, area under the receiver operating characteristic curve; MLR, multilinear regression; RF, random forest; L-GBM, light gradient boosting machine; MLP, multilayer perceptron; SVM, support vector machine; k-NN, k-nearest neighbors.
The underlined values denote the highest accuracy and AUC results.
Figure 2Top 10 factor contributions for sPDA and sPDA_tx prediction derived from each AA and MLR. (a) Risk factors for sPDA and sPDA_tx prediction according to the RF. (b) Risk factors for sPDA and sPDA_tx prediction according to the L-GBM. (c) Risk factors for sPDA and PDA_tx prediction according to the MLP. (d) Risk factors for sPDA and sPDA_tx prediction according to the SVM. (e) Risk factors for sPDA and sPDA_tx prediction according to k-NN. The risk factors are listed in order of the average absolute SHAP values yielded by each algorithm in the artificial intelligence analysis and were selected based on a p-value of 0.05 during the testing procedure; the selected factors are sorted in descending order according to the absolute values of the corresponding regression coefficients in the MLR. Abbreviations: sPDA, symptomatic patent ductus arteriosus; nPDA, asymptomatic PDA or spontaneously closed PDA; sPDA_tx, symptomatic patent ductus arteriosus with any treatment; sPDA_nontx, symptomatic patent ductus arteriosus without treatment; RF, random forest; L-GBM, light gradient boosting machine; MLP, multilayer perceptron; SVM, support vector machine; k-NN, k-nearest neighbors; MLR, multiple logistic regression. The abbreviations for all the factors are shown in Supplementary Table 1.
Top significant variables for sPDA and sPDA_tx Prediction.
| Standard | Artificial intelligence algorithms | |||||
|---|---|---|---|---|---|---|
| MLRa | RF | L-GBM | MLP | SVM | k-NN | |
| sPDA vs. nPDA | GA | GA | GA | |||
| pH | GA | GA | WT | |||
| SEPS | SEPS | WT | WT | |||
| WT | WT | SEPS | ||||
| SEPS | HT | |||||
| HT | HT | PROM | HT | ANS | ||
| EPI_R | PARITY | PROM | HC | |||
| GA | HT | |||||
| PROM | ||||||
| sPDA_tx vs. sPDA_nontx | pH | SEPS | SEPS | SEPS | SEPS | |
| SEPS | SFT (n) | |||||
| CPR_R | ANS | |||||
| MULTI | ||||||
| BPL | GRAV | GRAV | GRAV | |||
| MULTI | F_EDU | SEPS | ||||
| HC | MULTI (th) | |||||
| MULTI (th) | SFT (n) | |||||
| OLIGO | 5_AS | 5_AS | ||||
sPDA, symptomatic patent ductus arteriosus; nPDA, asymptomatic PDA or spontaneously closed PDA; sPDA_tx, symptomatic patent ductus arteriosus with any treatment; sPDA_nontx, symptomatic patent ductus arteriosus without treatment; RF, random forest; L-GBM, light gradient boosting machine; MLP, multilayer perceptron; SVM, support vector machine; k-NN, k-nearest neighbors. The abbreviations for all factors are shown in Supplementary Table 1.
Feature importance describes how relevant a factor is to the model's predictions. In MLR, the feature importance values were selected according to a p-value of 0.05 during the testing procedure. These are listed in descending order as the absolute values of the coefficients for the MLR and as the average absolute SHAP values for the AAs. The variables in italics indicate positive associations between the selected factors and sPDA or sPDA_tx.
a The factor analysis with MLR as the standard reference method.
Figure 3Relationships among the risk factors. (a) Dendrogram visualizing hierarchical clustering based on the obtained correlation coefficients. The dendrogram's x-axis comprises sPDA, sPDA_tx and all risk factors, and highly correlated factors are forced to be adjacent through hierarchical clustering. Each horizontal line indicates that the two associated subclusters are merged into one cluster, and the y-height indicates the distance between the two subclusters. We divided the factors into 9 clusters with a threshold of 1.15 and marked each cluster by color. (b) Heatmap of the correlation matrix. The x-axis and y-axis of the heatmap follow the arrangement of factors generated by hierarchical clustering, and the correlation coefficients are depicted in red or blue at the intersection of the factors. According to the color bar on the right, red represents a positive correlation, and blue represents a negative correlation. A darker color indicates a higher correlation, while a lighter color indicates a lower correlation. (c) Schematic diagram of the relationships among the factors. The circles (nodes) represent the risk factors, connected by the absolute value of the correlation coefficients (edges). In this network, the edges act as attraction forces, bringing highly correlated nodes closer together and pushing less-correlated nodes away from each other. The color of each cluster is the same as that in the dendrogram in (a). Abbreviations: sPDA, symptomatic patent ductus arteriosus; sPDA_tx, symptomatic patent ductus arteriosus with any treatment. The abbreviations for all the factors are shown in Supplementary Table 1.