| Literature DB >> 32936296 |
Eric Potash1, Rayid Ghani2, Joe Walsh1, Emile Jorgensen3, Cortland Lohff4, Nik Prachand3, Raed Mansour3.
Abstract
Importance: Childhood lead poisoning causes irreversible neurobehavioral deficits, but current practice is secondary prevention. Objective: To validate a machine learning (random forest) prediction model of elevated blood lead levels (EBLLs) by comparison with a parsimonious logistic regression. Design, Setting, and Participants: This prognostic study for temporal validation of multivariable prediction models used data from the Women, Infants, and Children (WIC) program of the Chicago Department of Public Health. Participants included a development cohort of children born from January 1, 2007, to December 31, 2012, and a validation WIC cohort born from January 1 to December 31, 2013. Blood lead levels were measured until December 31, 2018. Data were analyzed from January 1 to October 31, 2019. Exposures: Blood lead level test results; lead investigation findings; housing characteristics, permits, and violations; and demographic variables. Main Outcomes and Measures: Incident EBLL (≥6 μg/dL). Models were assessed using the area under the receiver operating characteristic curve (AUC) and confusion matrix metrics (positive predictive value, sensitivity, and specificity) at various thresholds.Entities:
Year: 2020 PMID: 32936296 PMCID: PMC7495240 DOI: 10.1001/jamanetworkopen.2020.12734
Source DB: PubMed Journal: JAMA Netw Open ISSN: 2574-3805
Figure 1. Temporal Validation Flowchart
BLL indicates blood lead level; EBLL, elevated BLL; and WIC, Women, Infants, and Children.
Most Important Predictors by Category for the Random Forest Model
| Data source | Variable | Aggregation | Importance | Value by outcome, mean (SD) | |||
|---|---|---|---|---|---|---|---|
| Space | Years | Function | No EBLL | EBLL | |||
| Blood lead levels | Child mean BLL, μg/dL | Tract | 3 | Median | 1.00 | 1.4 (0.4) | 1.7 (0.4) |
| Child mean BLL, μg/dL | Tract | 3 | Mean | 0.91 | 1.9 (0.6) | 2.3 (0.6) | |
| Child maximum BLL, μg/dL | Tract | 3 | Mean | 0.84 | 3.8 (1.1) | 4.5 (1.1) | |
| Child EBLL ≥6 μg/dL | Tract | 3 | Count | 0.81 | 16.0 (9.1) | 22.0 (9.3) | |
| Child mean BLL, μg/dL | Tract | 2 | Mean | 0.81 | 1.8 (0.5) | 2.2 (0.5) | |
| Building characteristics | Residential value, $105 | Block | NA | Mean | 0.52 | 0.6 (3.5) | 0.3 (1.6) |
| Latitude, ° | Address | NA | NA | 0.47 | 41.9 (0.1) | 41.8 (0.1) | |
| Housing age, y | Block | NA | Mean | 0.47 | 80.6 (24.1) | 89.1 (19.9) | |
| Residential value, $105 | Block | NA | Sum | 0.47 | 4.8 (11.1) | 3.2 (5.0) | |
| Rooms per unit, No. | Block | NA | Mean | 0.46 | 5.3 (1.1) | 5.2 (1.0) | |
| American Community Survey | Medicaid insurance, No. | Tract | 5 | Percentage | 0.32 | 28.6 (14.1) | 34.6 (12.2) |
| High school graduate, No. | Tract | 5 | Percentage | 0.30 | 14.0 (7.3) | 16.9 (6.9) | |
| Associate’s degree, No. | Tract | 5 | Percentage | 0.30 | 5.0 (2.8) | 4.8 (2.7) | |
| Employer insurance, No. | Tract | 5 | Percentage | 0.30 | 41.2 (17.0) | 33.9 (13.4) | |
| Bachelor’s degree, No. | Tract | 5 | Percentage | 0.30 | 13.7 (11.6) | 9.4 (8.1) | |
| Investigations | Compliance, No. | Tract | 3 | Percentage | 0.27 | 40.0 (22.6) | 33.5 (18.4) |
| Inspection, No. | Tract | 3 | Percentage | 0.27 | 58.4 (19.6) | 54.1 (16.9) | |
| Inspection, No. | Tract | 2 | Percentage | 0.25 | 58.4 (22.0) | 53.2 (19.4) | |
| Compliance, No. | Tract | 2 | Percentage | 0.24 | 37.8 (24.8) | 30.2 (20.4) | |
| Inspection interior hazard, No. | Tract | 3 | Percentage | 0.22 | 53.8 (32.8) | 62.6 (28.1) | |
| Building permits and violations | Violations, No. | Address | All | Count | 0.09 | 2.7 (9.2) | 2.8 (8.8) |
| Violations, No. | Address | 5 | Count | 0.09 | 2.4 (8.3) | 2.7 (8.4) | |
| Wall violations, No. | Address | All | Percentage | 0.08 | 8.6 (13.4) | 8.8 (11.8) | |
| Wall violations, No. | Address | 5 | Percentage | 0.08 | 8.5 (13.5) | 8.8 (11.9) | |
| Window violations, No. | Address | All | Percentage | 0.08 | 6.2 (11.1) | 8.0 (12.7) | |
Abbreviations: BLL, blood lead level; EBLL, elevated BLL; NA, not applicable.
Importance of a feature in the random forest model is measured as the mean reduction in error after a tree in the forest splits the data on that variable. Here it is rescaled to have a maximum of 1.00.
Excludes missing predictors.
Elevated levels were at least 6 μg/dL, venous or capillary samples.
Figure 2. Receiver Operating Characteristic Curves for Random Forest and Logistic Regression Models
Difference in the areas under the receiver operating characteristics curve was 0.05 (95% CI, 0.02-0.08).
Confusion Matrix Metrics for Random Forest and Logistic Regression Models
| Population at highest risk, % | Specificity, % | Sensitivity, % | PPV, % | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Random forest | Logistic regression | Difference (95% CI) | Random forest | Logistic regression | Difference (95% CI) | Random forest | Logistic regression | Difference (95% CI) | |
| 5 | 95.5 | 95.1 | 0.4 (0.0 to 0.7) | 16.2 | 8.1 | 8.1 (3.9 to 11.7) | 15.5 | 7.8 | 7.7 (3.7 to 11.3) |
| 10 | 90.4 | 90.1 | 0.2 (−0.2 to 0.7) | 27.3 | 19.9 | 7.4 (3.0 to 14.6) | 12.7 | 9.4 | 3.3 (1.3 to 6.7) |
| 20 | 80.3 | 79.9 | 0.3 (−0.1 to 1.4) | 42.4 | 38.4 | 4.1 (−1.1 to 12.5) | 9.9 | 8.9 | 1.0 (−0.1 to 3.0) |
Abbreviation: PPV, positive predictive value.
Binary predictions are obtained from continuous risk scores by classifying this highest-risk percentage as positive.
The 95% CIs were estimated using 10 000 bootstrap replications.