| Literature DB >> 29888094 |
Thomas Li1, Cheng Gao1, Chao Yan2, Sarah Osmundson1, Bradley A Malin1, You Chen1.
Abstract
Neonatal encephalopathy (NE) is a leading cause of neonatal mortality and lifetime neurological disability. The earlier the risk of NE can be assessed, the more effective interventions can be in preventing adverse outcomes. Existing studies that focus on intrapartum risk factors do not provide the early prognostic forecasting necessary to prepare healthcare professionals to intervene early in a high-risk NE case. This work used maternal data in a supervised machine learning framework to predict NE events. Specifically, we 1) collected the electronic medical records (EMRs) for 104 NE newborns and 31,054 non-NE newborns and their mothers, 2) trained and tested a regularized logistic regression on imbalanced and high-dimensional EMR data, and 3) discerned important features that could be possible risk factors. The learned model offers prenatal predictions of NE cases with an average area under the receiving operator characteristic curve (AUC) of 87% and identified the most important predictors.Entities:
Year: 2018 PMID: 29888094 PMCID: PMC5961831
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Summary of the VUMC Maternal Dataset. (W = White; B = Black; O = Other)
| #Mother-Infant Pairs | Mother Age | Mother Race | #NE Cases | #Controls | #Unique ICD-9 | #Unique CPT |
|---|---|---|---|---|---|---|
| 31,158 | 23 [12,50] | W: 66%, B: 17.2%, O: 16.8% | 104 | 31,054 | 7,860 | 6,095 |
Figure 1A framework for NE prediction and risk factor analysis based on EMR data.
Summary of Features
| Feature Type | ICD-9 codes | CPT codes | Age | Race |
|---|---|---|---|---|
| {0,1} | {0,1} |
Summary of Cohort Features
| # Sparse Features Eliminated | # ICD-9 codes | # CPT codes | # Selected Features | |
|---|---|---|---|---|
| 796.6 | 213.8 | 336.8 | 45.06 | |
| [782.1, 811.0] | [212.7, 214.9] | [335.1, 338.5] | [41.34, 48.78] |
The number of positive cases with an assigned code from CPT group 829.
| CPT code | Cases with Code | Description |
|---|---|---|
| 16 | Glucose; quantitative, blood (except reagent strip) | |
| 69 | Glucose; post glucose dose (includes glucose) | |
| 14 | Glucose; tolerance test, 3 specimens (includes glucose) | |
| 15 | Glucose; tolerance test, each additional beyond 3 specimens | |
| 22 | Blood glucose by glucose monitoring devices cleared by the FDA for home use | |
| 4 | Gamma Glutamyl Transferase |
Figure 2Histogram of AUC scores of 300 models. The normal distribution in red curve is generated based on the obtained mean and the standard deviation.
Figure 3The 15 features with average frequency ≥ 0.2 and average importance ≤ 0.05.
Figure 4Average importance and frequency for Age Range and Race Category.
Figure 5A plot of the correlation matrix for CPT codes in the 829 group. Crosses indicate that the relationship between the corresponding pair of codes failed to be confirmedas significant.Blue and red represent positive and negative relationships respectively. The larger the circle the greater the correlation.
Figure 6Distribution of the length of observation (in years) for mothers whose baby (a) did not have NE and (b) those who did have NE. Only features that fell within one year prior to delivery were incorporated in our models.
Performance of the NE models.
| AUC | Precision | Sensitivity | Specificity | |
|---|---|---|---|---|
| 0.8685 | 0.8080 | 0.8079 | 0.8104 | |
| [0.8625, 0.8745] | [0.8000, 0.8159] | [0.7982, 0.8177] | [0.8014, 0.8194] | |
| Normal | Normal | Normal | Normal |