Literature DB >> 35633976

Prediction Models for Bronchopulmonary Dysplasia in Preterm Infants: A Systematic Review.

Hai-Bo Peng1, Yuan-Li Zhan1, You Chen1, Zhen-Chao Jin1, Fang Liu1, Bo Wang2, Zhang-Bin Yu3,4.   

Abstract

Objective: To provide an overview and critical appraisal of prediction models for bronchopulmonary dysplasia (BPD) in preterm infants.
Methods: We searched PubMed, Embase, and the Cochrane Library to identify relevant studies (up to November 2021). We included studies that reported prediction model development and/or validation of BPD in preterm infants born at ≤32 weeks and/or ≤1,500 g birth weight. We extracted the data independently based on the CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS). We assessed risk of bias and applicability independently using the Prediction model Risk Of Bias ASsessment Tool (PROBAST).
Results: Twenty-one prediction models from 13 studies reporting on model development and 21 models from 10 studies reporting on external validation were included. Oxygen dependency at 36 weeks' postmenstrual age was the most frequently reported outcome in both development studies (71%) and validation studies (81%). The most frequently used predictors in the models were birth weight (67%), gestational age (62%), and sex (52%). Nearly all included studies had high risk of bias, most often due to inadequate analysis. Small sample sizes and insufficient event patients were common in both study types. Missing data were often not reported or were discarded. Most studies reported on the models' discrimination, while calibration was seldom assessed (development, 19%; validation, 10%). Internal validation was lacking in 69% of development studies.
Conclusion: The included studies had many methodological shortcomings. Future work should focus on following the recommended approaches for developing and validating BPD prediction models.
Copyright © 2022 Peng, Zhan, Chen, Jin, Liu, Wang and Yu.

Entities:  

Keywords:  bronchopulmonary dysplasia; model; prediction; preterm infants; systematic review

Year:  2022        PMID: 35633976      PMCID: PMC9133667          DOI: 10.3389/fped.2022.856159

Source DB:  PubMed          Journal:  Front Pediatr        ISSN: 2296-2360            Impact factor:   3.569


Introduction

Preterm infant survival has increased in the last three decades (1–3), while bronchopulmonary dysplasia (BPD) remains the most prevalent serious complication of prematurity, affecting 10.8–37.1% of preterm neonates born at 240/7 to 316/7 weeks’ gestational age and birth weight <1,500 g (4). As survivors with BPD have high risk of poor long-term pulmonary and neurodevelopmental outcomes in childhood and even adulthood (5–8), it is imperative to optimize BPD prevention and treatment strategies. Early identification of infants at risk of developing BPD would benefit preventive interventions when airway injury is still functional and reversible. To aid health care providers in estimating the probability of BPD occurrence in the future and to inform decision-making, many models for predicting BPD have been established in recent years. Nevertheless, such models are often of variable quality and yield inconsistent findings, leading to confusion or uncertainty among health care providers regarding which model to use or recommend. In a 2013 systematic review, Onland et al. reported 26 prediction models for assessing the probability of BPD or death in all preterm infants born at <37 weeks’ gestation, where most existing clinical prediction models were poor to moderate BPD predictors (9). Furthermore, during that review, no guides for systematic reviews of prediction modeling studies or standardization tools for assessing the prediction models’ risk of bias (ROB) were available. Since then, more BPD prediction modeling studies have been published, whereas systematic reviews of such studies have not yet been updated in the last 9 years. The guideline CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) has been available since 2014 (10), and the Prediction model Risk Of Bias ASsessment Tool (PROBAST) for assessing the ROB and applicability of prediction model studies has been available since 2019 (11). Accordingly, the present systematic review was aimed at updating the systematic review of BPD prediction models and critically evaluating the methods and reporting of studies that developed or externally validated prediction models for BPD in preterm infants born at ≤32 weeks and/or ≤1,500 g birth weight based on the CHARMS checklist and PROBAST.

Methods

This systematic review of all studies on prediction models for BPD in preterm infants is reported according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (12).

Search Strategy

PubMed (MEDLINE), Embase, and the Cochrane Library were systematically searched from inception through to 12 November 2021, for studies reporting prediction models of BPD in preterm infants. We identified relevant studies and maximized search accuracy using the following terms: BPD, chronic lung disease, preterm infants, and prediction. The online Supplementary Material 1 shows the electronic search strategies. The search was not limited by language.

Eligibility Criteria

Articles were included if: (1) the target population was preterm infants born at ≤32 weeks and/or ≤1,500 g birth weight; (2) the study detailed prediction model development and/or external validation; (3) the main prediction outcome was BPD, defined as oxygen requirement at 28 days of life (BPD28) and/or oxygen requirement at 36 weeks’ postmenstrual age (PMA) (BPD36); (4) the model was constructed with at least two predictors; and (5) the purpose of the model was for predicting BPD in preterm infants from the first 2 weeks of life. Articles were ineligible when the studies used the data of infants born before 1990, as surfactant was not routinely used before this year (pre-surfactant era); if the outcome to be predicted was the composite outcome “BPD or death”; when the prognostic use of lung ultrasound scores (LUS) was investigated; when the study was conducted at high altitudes; when it was only a methodological study; when the article was not published in English; or when the article was a conference abstract, review, or letter.

Study Selection and Data Extraction

Two reviewers independently screened the titles, abstracts, and full texts in duplicate for eligibility. In case of discrepancies, a third reviewer was involved to establish consensus. The reviewers used a standardized data extraction form based on the CHARMS checklist (10). The following items were extracted from the studies on prediction model development: study design, study population, predicted outcome and time horizon, intended moment of model use, number of candidate predictors, sample size, number of events, missing data approach, variables selection method, modeling method, model presentation, predictors included in the final model, internal validation method, and assessment of model performance (i.e., discrimination and calibration). The following items were extracted from the prediction model external validation studies: study design, study population, predicted outcome and time horizon, intended moment of model use, sample size, number of events, missing data approach, and assessment of model performance (i.e., discrimination and calibration). The events per variable (EPV) was defined as the number of events divided by the number of candidate predictor variables used. The outcome BPD28 was defined as oxygen dependency at 28 days of life; BPD36 was defined as oxygen dependency at 36 weeks PMA.

Assessment of Bias

We assessed the ROB and applicability of each article with PROBAST. PROBAST consists of 20 signaling questions across four domains (participants, predictors, outcome, and analysis). The ROB and applicability of original studies were classified as high, low, or unclear for each domain via comprehensive evaluation. Only if each domain had low ROB would a study be classified as overall low ROB.

Model Performance

The results of the development and external validation studies were summarized by using descriptive statistics. If an article described the development or external validation of multiple (existing) models, separate data extraction for each model was conducted. Each model’s predictive performance, including model discrimination and calibration measures, was extracted. Discrimination is often quantified by the C statistic. The C statistic is the most commonly used measure for determining the discriminative performance for binary outcomes. Generally, a C statistic < 0.6 is considered poor, a C statistic between 0.6 and 0.75 is possibly helpful, a C statistic > 0.75 is clearly useful (13). Calibration is often quantified by the calibration intercept and calibration slope.

Results

After excluding duplicates, the initial search returned 5,749 articles. After title and abstract screening, 106 articles were provisionally selected for full-text screening. Subsequently, 88 articles were excluded, among which 11 articles used the composite outcome “BPD or death.” In total, 18 studies (14–31) were included in this systematic review (Figure 1). Eight studies (14, 16, 19, 21, 22, 25–27) described model development without external validation, five studies (15, 17, 24, 29, 30) described model development with external validation in independent data, and five studies (18, 20, 23, 28, 31) described external validation with or without model updating.
FIGURE 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram.

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram.

Characteristics of Studies Describing Bronchopulmonary Dysplasia Prediction Model Development

Thirteen studies described BPD prediction model development, in which 21 models were developed. Table 1 shows the key characteristics of study design, study population, outcome, and intended moment of model use in the included model development studies. Table 2 shows the study and performance characteristics of the developed models.
TABLE 1

Design characteristics of the 13 studies describing the development of BPD prediction models.

StudyCountryStudy designYears of dataStudy populationTiming of BPDIntended moment of model useModels, nDifferences between models caused by differences in the following:
El Faleh et al. (17)SwitzerlandRegistry2009–2010BW < 1,501 g and/or GA between 23 0/7 and 31 6/7 weeksDOL28 and PMA361 day2Timing of BPD
Shim et al. (15)KoreaRegistry2013–2016BW < 1,500 g and GA ≥ 22 weeksPMA361 h3Severity of BPD
Ushida et al. (14)JapanRegistry2006–2015GA < 32 weeks and BW ≤ 1,500 gPMA36At birth1NA
Verder et al. (16)DenmarkProspective cohort2019GA 24–31 weeksDOL28At birth1NA
Valenzuela-Stutman et al. (19)Argentina, Chile, Paraguay, Peru, UruguayRegistry2001–2015BW 500–1,500 gPMA36At birth, 3 day, 7 day, 14 day4Intended moment of using the model
Sullivan et al. (21)United StatesProspective cohort2009–2015BW < 1,500 gPMA367 day3Predictors
Bentsen et al. (22)United StatesProspective cohort2014–2016GA < 28 weeksPMA362 day1NA
Gursoy et al. (24)TurkeyRetrospective cohort2006–2009GA ≤ 32 weeks and BW ≤ 1,500 gDOL283 day1NA
Tian et al. (26)ChinaProspective cohort2010–2011BW ≤ 1,500 g and GA ≤ 32 weeksDOL28At birth1NA
Wang et al. (25)ChinaProspective cohort2011–2013BW ≤ 1,500 g and GA ≤ 32 weeksDOL2814 day1NA
May et al. (27)United KingdomProspective cohort2004–2007GA 24–32 weeksDOL2814 day1NA
Henderson-Smart et al. (29)Australia, New ZealandRegistry1998–1999GA 22–31 weeksPMA36At birth1NA
Kim et al. (30)KoreaRetrospective cohort1997–1999BW < 1,500 gPMA364 day, 7 day, 10 day1NA

BPD, bronchopulmonary dysplasia; BW, birth weight; DOL, days of life; GA, gestational age; PMA, postmenstrual age; NA, not applicable.

TABLE 2

Study and performance characteristics of the developed prediction models.

StudyOutcomeIntended moment of model useSample sizeEventsEPVMissing dataUnivariable analysisModeling methodModel presentationPredictors, nDiscrimination (C statistic)CalibrationInternal validation
El Faleh et al. (17)BPD281 day1,23226622.2Complete case studyYesLogistic regressionFormula; web calculator70.88HL testNR
BPD361 day1,22513811.5Complete case studyYesLogistic regressionFormula50.84HL testNR
Shim et al. (15)All BPD361 h4,6002,583184.5Complete case studyNoLogistic regressionFormula40.908 (0.899–0.916)NRNR
Moderate to severe BPD361 h4,6001,37097.9Complete case studyNoLogistic regressionFormula70.815 (0.802–0.828)NRNR
Severe BPD361 h4,60081858.4Complete case studyNoLogistic regressionFormula80.815 (0.800–0.831)NRNR
Ushida et al. (14)BPD36At birth18,8584,986415.5Complete case studyNoLogistic regressionFormula80.80 (0.79–0.81)Calibration plotsSplit-sample validation
Verder et al. (16)BPD28At birth61262NRNoSVMNR4NRNRCross validation
Valenzuela-Stutman et al. (19)BPD36At birth16,4072,580215NRNoLogistic regressionWeb calculator50.788NRSplit-sample validation
BPD363 day16,4072,580151.8NRNoLogistic regressionWeb calculator50.818NRSplit-sample validation
BPD367 day16,4072,580143.3NRNoLogistic regressionWeb calculator50.827NRSplit-sample validation
BPD3614 day16,4072,580127.5NRNoLogistic regressionWeb calculator50.894NRSplit-sample validation
Sullivan et al. (21)BPD367 day443159NAComplete case studyNoLogistic regressionNR60.921 (0.897–0.945)NRBootstrapping
BPD367 day443159NAComplete case studyNoLogistic regressionNR50.886 (0.854–0.913)NRBootstrapping
BPD367 day443159NAComplete case studyNoLogistic regressionNR110.935 (0.920–0.951)NRBootstrapping
Bentsen et al. (22)BPD362 day37181.1NRYesLogistic regressionNR30.893 (0.735–0.973)NRNR
Gursoy et al. (24)BPD283 day6521504.8NRYesLogistic regressionScoring system70.93NRNR
Tian et al. (26)BPD2814 day7324NAComplete case studyYesLogistic regressionNR20.974NRNR
Wang et al. (25)BPD28At birth13435NANRNoLogistic regressionNR20.849NRNR
May et al. (27)BPD2814 day78392.6NRNoLogistic regressionNR20.97NRNR
Henderson-Smart et al. (29)BPD36At birth5,5991,23558.8Complete case studyYesLogistic regressionNR30.84HL testNR
Kim et al. (30)BPD364 day161301.6NRNoLogistic regressionScoring system80.76NRNR

BPD, bronchopulmonary dysplasia; DOL, days of life; EPV, events per variable; HL test, Hosmer–Lemeshow test; NA, not available; NR, not reported; PMA, postmenstrual age; SVM, support vector machine.

Design characteristics of the 13 studies describing the development of BPD prediction models. BPD, bronchopulmonary dysplasia; BW, birth weight; DOL, days of life; GA, gestational age; PMA, postmenstrual age; NA, not applicable. Study and performance characteristics of the developed prediction models. BPD, bronchopulmonary dysplasia; DOL, days of life; EPV, events per variable; HL test, Hosmer–Lemeshow test; NA, not available; NR, not reported; PMA, postmenstrual age; SVM, support vector machine.

Study Design

Eleven included studies (85%) originated from registry or prospective cohorts; two studies (15%) were derived from retrospective cohorts. The data used for developing the models were collected between 1997 and 2019. Of all 13 model development studies, four (31%) used only gestational age as the inclusion criterion, three studies (23%) used only birth weight as the inclusion criterion, and six studies (46%) used both gestational age and birth weight as inclusion criteria. All studies were developed based on statistical methods. Twelve studies (92%) used logistic regression as the prediction modeling approach; one study (8%) used machine learning.

Outcome to Be Predicted

The outcome to be predicted in all included studies was BPD, yet the definitions of BPD varied across the models. Six models (29%) used BPD28 as the primary outcome; the median incidence was 29% (range, 22–50%). Fifteen models (71%) used BPD36 as the primary outcome, with values of 11–56% (median, 22%). Eighteen models (86%) were developed to predict the risk of developing BPD within 7 days of life, and three models (14%) were developed to be used between 7 and 14 days of life.

Predictors

Ten of the 13 studies reported the number of candidate predictors considered for inclusion in the BPD prediction models, with 12–31 candidate predictors (median, 15). Two to 11 predictors were included in the final model (median, 5). Five studies (38%) used univariable analysis to select predictors in the multivariable analyses. Figure 2 shows the predictors included in the final prediction models. Nineteen models (90%) used perinatal variables, 7 studies (33%) used antenatal variables, and 17 models (81%) used postnatal variables. The most frequently included predictor in the 21 prediction models was birth weight (n = 14, 67%), followed by gestational age (n = 13, 62%), sex (n = 11, 52%), 5-min Apgar score (n = 6, 29%), respiratory distress syndrome (n = 6, 29%), mechanical ventilation (n = 5, 24%), antenatal steroids (n = 4, 19%), maternal hypertensive disorders (n = 4, 19%), surfactant (n = 4, 19%), and patent ductus arteriosus (n = 4, 19%).
FIGURE 2

Predictors included in the final development models.

Predictors included in the final development models.

Sample Size

The models were developed with 37–18,858 participants (median, 1,225), and there were 18–4,986 events (median, 159). The EPV could be calculated in 16 models (76%) with a median of 59 and a range of 1–416. The EPV was <10 in 31% of the models in which it was calculated.

Missing Data

Seven studies (54%) did not mention missing data. Six studies (46%) mentioned the methods for addressing missing data, where they all used complete case analysis.

Model Presentation

Presentation was available for 12 models (57%). Five models were presented as regression formulae, two models were presented as scoring systems, four models were presented as web calculators, and one model was presented as both a regression formula and web calculator.

Apparent Predictive Performance

Twelve studies (95%) assessed discrimination with the C statistic, with values of 0.76–0.97. Calibration was assessed for four models (19%), two models used the Hosmer–Lemeshow goodness-of-fit test, and one model used calibration plots.

Internal Validation

Nine studies (69%) did not report internal validation of the developed models. Nine models developed in four studies were internally validated. Validation was performed for five models (56%) with split sampling, in one model (11%) with cross-validation, and in three models (33%) with bootstrapping.

Risk of Bias and Applicability Assessment of the Included Model Development Studies

Figure 3 shows a summary of the ROB and applicability for all developed models. For the domain outcome, the ROB of all models was considered low, as a broad definition of BPD was accepted. There was high participants’ domain-related ROB in 29% of the models. For the domain predictors, 33 and 67% of the models had high and low ROB, respectively. The domain analysis was assessed as having high ROB in all prediction models. No study handled missing data appropriately, as information on missing data was rarely reported or participants with missing data were omitted. Prediction model calibration was insufficient, as only one study reported calibration plots, while the other studies did not report calibration or only used the Hosmer–Lemeshow test. In summary, the overall ROB was high across all models.
FIGURE 3

Risk of bias and applicability assessment of developed models using Prediction model Risk Of Bias ASsessment Tool (PROBAST).

Risk of bias and applicability assessment of developed models using Prediction model Risk Of Bias ASsessment Tool (PROBAST). When the 21 models were assessed according to applicability concerns, 24% of the models were assessed as high concern due to the inclusion of participants different from those in our research question (n = 4) or inconsistency between predictors and the review question (n = 5).

Characteristics of Studies Describing External Validation of the Bronchopulmonary Dysplasia Prediction Models

We included 10 studies that externally validated 21 BPD prediction models (Table 3). Five of these studies also described prediction model development. Table 4 shows the study and performance characteristics of the validated models.
TABLE 3

Design characteristics of the 10 studies describing external validation of BPD prediction models.

StudyCountryStudy designYears of dataStudy populationTiming of BPDIntended moment of model useModel validated
El Faleh et al. (17)SwitzerlandRegistry2014–2015BW < 1,501 g and/or GA between 23 0/7 and 31 6/7 weeksDOL28 or PMA361 dayEl Faleh et al. (17)
Shim et al. (15)KoreaRegistry2017BW < 1,500 g and GA ≥ 22 weeksPMA361 hShim et al. (15)
Bhattacharjee et al. (18)United StatesRetrospective cohort2012–2013BW < 1,500 gPMA363 dayRSS
Lee et al. (20)KoreaRegistry2013–2016BW < 1,500 g and GA 22–32 weeksPMA361 hCRIB II; CRIB II-BE
Sullivan et al. (23)United StatesRetrospective cohort2004–2014BW < 1,500 gPMA3612 h, 1 day, 7 dayaHRC-24h, aHRC-7d, SNAP-II, CRIB-II
Gursoy et al. (24)TurkeyProspective cohort2012GA ≤ 32 weeks and BW ≤ 1,500 gDOL283 dayBPD-TM score
May et al. (28)United KingdomRetrospective cohort1995–1998BW < 1,500 g and GA < 33 weeksDOL28 or PMA362 daySimple pulmonary score
Henderson-Smart et al. (29)Australia, New ZealandRegistry2000–2001GA 22–31 weeksPMA36At birthHenderson-Smart et al. (29)
Kim et al. (30)KoreaProspective cohort2000–2001BW < 1,500 gPMA364 day, 7 day, 10 daySMUMRV; Yoder model (55)
Chien et al. (31)CanadaRegistry1996–1997GA ≤ 32 weeksPMA3612 hSNAP-II + GA, SGA, sex, low Apgar score, and outborn status

BPD, bronchopulmonary dysplasia; BPD-TM, bronchopulmonary dysplasia test measure; BW, birth weight; CRIB, Clinical Risk Indicator fores; CRIB II-BE, Clinical Risk Indicator fores omitting base excess; DOL, days of life; GA, gestational age; HRC, heart rate characteristics; aHRC-24h, average first day HRC index; aHRC-7d, average first week HRC index within 7 days of birth; PMA, postmenstrual age; RSS, respiratory severity score; SGA, small for gestational age; SMUMRV, modified respiratory variables; SNAP-II, Score for Neonatal Acute Physiology-II.

TABLE 4

Study and performance characteristics of externally validated models.

StudyModelOutcomeSample sizeOutcome eventsMissing dataDiscrimination (C-statistic)Calibration
El Faleh et al. (17)El Faleh et al. (17)BPD281,733437Complete case study0.92NR
El Faleh et al. (17)BPD361,724191Complete case study0.88NR
Shim et al. (15)Shim et al. (15)BPD36 (all grade)1,7401,003Complete case studyNRNR
Shim et al. (15)BPD36 (moderate to severe)1,740563Complete case studyNRNR
Shim et al. (15)BPD36 (severe)1,740388Complete case studyNRNR
Bhattacharjee et al. (18)RSSBPD366931NR0.61NR
Lee et al. (20)CRIB IIBPD364,6941,443Complete case study0.77 (0.76–0.79)NR
CRIB II-BEBPD366,0381,916Complete case study0.77 (0.76–0.78)NR
Sullivan et al. (23)aHRC-24hBPD3656698NR0.827NR
aHRC-7dBPD3656698NR0.827NR
SNAP-IIBPD3656698NR0.839NR
CRIB-IIBPD3656698NR0.840NR
Gursoy et al. (24)BPD-TM scoreBPD2817254NR0.903NR
May et al. (28)Simple pulmonary score-day 2BPD287532NR0.79 (cohort 1), 0.84 (cohort 2)NR
Simple pulmonary score-day 2BPD367522NR0.86 (cohort 1), 0.76 (cohort 2)NR
Simple pulmonary score-day 7BPD287532NR0.75 (cohort 1), 0.97 (cohort 2)NR
Simple pulmonary score-day 7BPD367522NR0.83 (cohort 1), 0.88 (cohort 2)NR
Henderson-Smart et al. (29)Henderson-Smart et al. (29)BPD365,8541,475Complete case study0.84HL test
Kim et al. (30)SMUMRVBPD36969NR0.90–0.94NR
Yoder model (55, 57)BPD36969NR0.92–0.96NR
Chien et al. (31)SNAP-II + GA, SGA, sex, low Apgar, and outborn statusBPD364226NRComplete case study0.83HL test

BPD, bronchopulmonary dysplasia; BPD-TM, bronchopulmonary dysplasia test measure; BW, birth weight; CRIB, Clinical Risk Indicator fores; CRIB II-BE, Clinical Risk Indicator fores omitting base excess; DOL, days of life; GA, gestational age; HL test, Hosmer–Lemeshow test; HRC, heart rate characteristics; NR, not reported; PMA, postmenstrual age; RSS, respiratory severity score; SGA, small for gestational age; SMUMRV, modified respiratory variables; SNAP-II, Score for Neonatal Acute Physiology-II.

Design characteristics of the 10 studies describing external validation of BPD prediction models. BPD, bronchopulmonary dysplasia; BPD-TM, bronchopulmonary dysplasia test measure; BW, birth weight; CRIB, Clinical Risk Indicator fores; CRIB II-BE, Clinical Risk Indicator fores omitting base excess; DOL, days of life; GA, gestational age; HRC, heart rate characteristics; aHRC-24h, average first day HRC index; aHRC-7d, average first week HRC index within 7 days of birth; PMA, postmenstrual age; RSS, respiratory severity score; SGA, small for gestational age; SMUMRV, modified respiratory variables; SNAP-II, Score for Neonatal Acute Physiology-II. Study and performance characteristics of externally validated models. BPD, bronchopulmonary dysplasia; BPD-TM, bronchopulmonary dysplasia test measure; BW, birth weight; CRIB, Clinical Risk Indicator fores; CRIB II-BE, Clinical Risk Indicator fores omitting base excess; DOL, days of life; GA, gestational age; HL test, Hosmer–Lemeshow test; HRC, heart rate characteristics; NR, not reported; PMA, postmenstrual age; RSS, respiratory severity score; SGA, small for gestational age; SMUMRV, modified respiratory variables; SNAP-II, Score for Neonatal Acute Physiology-II.

Models Validated

The most frequently validated models were CRIB-II (Clinical Risk Indicator fores-II) and SNAP-II (Score for Neonatal Acute Physiology-II); both were externally validated twice. The other models were externally validated once. Eight validation studies (80%) used existing data to externally validate a BPD prediction model. Two studies (20%) collected prospective data for external validation. The data used for validating the BPD prediction models were all collected between 1995 and 2017.

Outcome

Four models (19%) used BPD28 as the outcome. The incidence of BPD28 was 25–43% (median, 37%). Seventeen models (81%) used BPD36 as the outcome. The incidence of BPD36 was 9–58% (median, 24%). All studies reported the number of patients. The number of event patients could be identified in nine studies (90%). The validation articles included 69–6,038 patients (median, 566). The median number of event patients was 98 (range, 9–1,916). Twelve models (60%) had <100 event patients. Five studies mentioned missing data (50%). These studies all used complete case analysis to address the missing data.

Predictive Performance

Nine of the 10 validation studies (90%) assessed model discrimination with the C statistic (range, 0.61–0.97). Two models (10%) reported model calibration using the Hosmer–Lemeshow test.

Risk of Bias and Applicability Assessment of the Included External Validation Studies

Figure 4 presents the summary of the ROB and applicability by domain. Outcome-related ROB was low across all models. For the domain analysis, 20 models (96%) were assessed as high ROB due to inappropriate handling of missing data and inefficient presentation of calibration, while one model was assessed as unclear. This resulted in an overall high ROB for the validation of 20 models (95%) and overall unclear ROB for one model (5%).
FIGURE 4

Risk of bias and applicability assessment of externally validated models using Prediction model Risk Of Bias ASsessment Tool (PROBAST).

Risk of bias and applicability assessment of externally validated models using Prediction model Risk Of Bias ASsessment Tool (PROBAST). The models’ applicability to our research question was high concern in 43% of the models, mainly due to the inclusion of participants different from those in our research question.

Discussion

In the present systematic review, we summarize all prognostic models for developing BPD in preterm infants born at ≤32 weeks and/or ≤1,500 g birth weight. In total, 13 studies describing model development and 10 studies describing external validation were included. High ROB was observed across nearly all models, mostly due to inappropriate analysis, particularly for the handling of missing data, presenting insufficient performance statistics, and small sample size. Furthermore, several studies did not report full models, making external validation and implementation in clinical practice difficult. Meta-analysis was not possible because external validation studies of the same model were insufficient. Prediction models are developed to support medical decision-making. Therefore, it is vital to identify a target population in which predictions serve a clinical need. Then, a representative dataset on which the prediction model is based can be developed and validated (32). In the present review, studies involving preterm infants born at ≤32 weeks and/or ≤1,500 g birth weight were included, while those that included more mature preterm infants were excluded. We excluded such studies because BPD incidence is very uncommon in infants born with birth weights of >1,500 g and after 32 weeks’ gestation (33). Accordingly, there is little clinical need for predicting BPD in such infants. Therefore, we recommend that future studies of BPD prediction models involve very low-birth weight infants or very preterm infants rather than all preterm infants. In the present review, the outcome to be predicted was BPD. The included studies used different definitions of BPD. Most of the included studies used BPD36 as the outcome to be predicted while a smaller proportion used BPD28. Even when the scope of BPD was the same, the definitions of BPD could still differ based on the mode of respiratory support. The lack of a uniform definition of BPD in the included studies reflects the changing BPD definition of these years (34–37). Among the included studies, seven studies used the outcome BPD28, four of which used the definition proposed by the NIH in 2001 (36), and the other three only stated BPD28 as oxygen was still required at 28 days of life, and no further details were elaborated. Death is a competing outcome of BPD, some studies used the composite outcome “BPD or death” when developing prediction models for BPD. This composite outcome avoided exclusion of deceased patients who might developed BPD if they survived. Nonetheless, not all patients with early death will develop BPD. When models developed for prediction of “BPD or death” are used to predict BPD risk only, the predictive power will be lower (9), leading to a reduction in the accuracy of the prediction results. Besides, many models for prediction of death have been developed and most of them show good predictive performance (38). Utilize different models to predict BPD and death in clinical practice will probably result in higher accuracy. Therefore, BPD was selected as the outcome in our review, rather than “BPD or death.” Most prediction models used clinical indicators including prenatal, perinatal, and postnatal factors to develop BPD prediction models. Though a large number of studies tried to explore the correlation between biomarkers and BPD, few biomarkers were included in prediction models. Of the studies included in this systematic review, only two studies constructed prediction models with biomarkers, including interleukin-6, clara cell protein-16, and Krebs von den Lungen-6 (25, 26). Genome-wide association studies and candidate gene studies investigating the correlation between genetic predisposition and BPD have been reported, but the results of different population studies are inconsistent (39). The genes specific for BPD remain to be further investigated before they could be applied to predict risk of BPD. Similar to other systematic reviews of prediction models (38, 40, 41), we too observed several methodological shortcomings in most of the included studies. First, although many of the studies used a large sample acquired from registries, around half of them used a sample that was too small. For example, six models were developed with samples of EPV <10, and 12 models were validated with samples with <100 events. When developing prediction models for binary outcomes, an EPV of at least 10 has been widely adopted as a criterion to minimize overfitting (42). For external validation studies, a minimum of 100 event patients is recommended (43). Recently, Riley et al. proposed formulae for calculating the minimum sample size required for developing regression-based prediction models (44), and Pavlou et al. have proposed equations for estimating the required sample size for external validation of risk models for binary outcomes (45), rendering sample size calculation more precise and efficient. Therefore, small samples should be avoided for prediction model development and validation, and it would be better to calculate the sample size with these recently reported formulae. Second, none of the included studies handled missing data appropriately. Most did not report missing data or only included complete cases for analysis. Missing data are a common but easily underappreciated problem in prediction studies; complete case analysis can lead to biased predictor–outcome associations and biased model performance (46–48). To avoid biased model performance as a result of the deletion or single imputation of participants’ missing data, multiple imputation is recommended (46, 49–51). Third, around 30% of the included studies describing prediction model development selected predictors via univariable analysis. However, univariable analysis can lead to the omission of important predictors, as selection is based on their statistical significance as a single predictor rather than in context with other predictors (52). Therefore, univariable analysis should be avoided in predictor selection. Alternative approaches include listing a limited number of candidate predictors to consider for the prediction model, and some statistical selection methods, including backward elimination and forward selection (53). Fourth, most studies did not present discrimination and calibration simultaneously. Discrimination refers to the ability of the prediction model to separate individuals with and without the outcome event while calibration reflects the level of agreement between the observed outcomes and predictions (53). Both model discrimination and calibration must be evaluated to fully assess the predictive performance of a model. Most models in our review had a C statistic of >0.75. However, these models can still perform poorly in a new population because they could have been overfitted to the development data. Calibration was reported in around 20% of models. Nevertheless, only one study used the recommended method calibration plot, while most of the studies used only the Hosmer–Lemeshow test, which has been considered insufficient (11). Therefore, both discrimination and calibration should be reported for a model and a calibration plot is recommended for assessing calibration. Finally, over half of the developed models were not validated internally. Internal validation is important for quantifying overfitting of the developed model and optimism in its predictive performance, except when the sample size and EPV are extremely large (11). In the present review, the most frequently used method of internal validation was split sampling, followed by bootstrapping. However, split sampling is not recommended, as it is statistically inefficient because not all available data are used for producing the prediction model (54). Bootstrapping is preferred especially when the development sample is relatively small and/or a high number of candidate predictors is studied (55). No systematic review has been published since the systematic review of BPD prediction models in 2013 by Onland et al. (9). Compared with their review, ours has several improvements. We have followed the CHARMS checklist, and extracted and assessed most key items within 11 domains. Furthermore, we assessed the ROB and applicability of the included models with a standard tool, PROBAST. The limitations of this review are: the exclusion of LUS-related studies. However, a meta-analysis in press has revealed that the LUS is accurate for early prediction of BPD and moderate-to-severe BPD in an average population of preterm infants of <32 weeks’ gestation (56). Second, we excluded studies that included preterm infants born at >32 weeks and >1,500 g birth weight. Therefore, studies that included very low-birth weight or very preterm infants were also excluded. Third, we excluded studies intended at predicting “BPD or death.” Therefore, we were unable to assess models for that composite outcome. Recommendations for future development studies include collecting data by conducting prospective longitudinal cohort studies, selecting preterm infants born at ≤32 weeks and/or ≤1,500 g birth weight as participants, using the outcome definition proposed by Jensen et al. (34), choosing appropriate clinical indicators and biomarkers as predictors, using sufficiently large sample size (EPV ≥ 20) and handling missing data with multiple imputation.

Conclusion

In this review, we included 18 studies that developed or externally validated BPD prediction models. The included studies were assessed thoroughly using the CHARMS checklist (10) and PROBAST. There were many reporting or methodological shortcomings in the included studies. For better reporting of BPD prediction models, we recommend using sufficiently large samples for developing or validating a model, using multiple imputation to address missing data, avoiding univariable analysis for selecting predictors, assessing a model’s predictive performance with both discrimination and calibration, and using internal validation for newly developed models.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author Contributions

H-BP, Y-LZ, and Z-BY designed the work. H-BP, Y-LZ, and YC extracted the data. Y-LZ, Z-CJ, FL, BW, and Z-BY analyzed the data. H-BP wrote the manuscript. YC and Z-BY supervised the work. All authors critically revised and approved the final version of the manuscript.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
  57 in total

1.  Pulmonary outcome in former preterm, very low birth weight children with bronchopulmonary dysplasia: a case-control follow-up at school age.

Authors:  Maike Vom Hove; Freerk Prenzel; Holm H Uhlig; Eva Robel-Tillig
Journal:  J Pediatr       Date:  2013-09-20       Impact factor: 4.406

2.  The Clinical Risk Index for Babies II for Prediction of Time-Dependent Mortality and Short-Term Morbidities in Very Low Birth Weight Infants.

Authors:  Soon Min Lee; Myung Hee Lee; Yun Sil Chang
Journal:  Neonatology       Date:  2019-07-15       Impact factor: 4.035

3.  Meta-Analysis of Lung Ultrasound Scores for Early Prediction of Bronchopulmonary Dysplasia.

Authors:  Lucilla Pezza; Almudena Alonso-Ojembarrena; Yasser Elsayed; Nadya Yousef; Luca Vedovelli; Francesco Raimondi; Daniele De Luca
Journal:  Ann Am Thorac Soc       Date:  2022-04

4.  Early Heart Rate Characteristics Predict Death and Morbidities in Preterm Infants.

Authors:  Brynne A Sullivan; Christina McClure; Jamie Hicks; Douglas E Lake; J Randall Moorman; Karen D Fairchild
Journal:  J Pediatr       Date:  2016-04-22       Impact factor: 4.406

Review 5.  Clinical prediction models for bronchopulmonary dysplasia: a systematic review and external validation study.

Authors:  Wes Onland; Thomas P Debray; Matthew M Laughon; Martijn Miedema; Filip Cools; Lisa M Askie; Jeanette M Asselin; Sandra A Calvert; Sherry E Courtney; Carlo Dani; David J Durand; Neil Marlow; Janet L Peacock; J Jane Pillow; Roger F Soll; Ulrich H Thome; Patrick Truffert; Michael D Schreiber; Patrick Van Reempts; Valentina Vendettuoli; Giovanni Vento; Anton H van Kaam; Karel G Moons; Martin Offringa
Journal:  BMC Pediatr       Date:  2013-12-17       Impact factor: 2.125

6.  Early warning scores for detecting deterioration in adult hospital patients: systematic review and critical appraisal of methodology.

Authors:  Stephen Gerry; Timothy Bonnici; Jacqueline Birks; Shona Kirtley; Pradeep S Virdee; Peter J Watkinson; Gary S Collins
Journal:  BMJ       Date:  2020-05-20

7.  The Diagnosis of Bronchopulmonary Dysplasia in Very Preterm Infants. An Evidence-based Approach.

Authors:  Erik A Jensen; Kevin Dysart; Marie G Gantz; Scott McDonald; Nicolas A Bamat; Martin Keszler; Haresh Kirpalani; Matthew M Laughon; Brenda B Poindexter; Andrea F Duncan; Bradley A Yoder; Eric C Eichenwald; Sara B DeMauro
Journal:  Am J Respir Crit Care Med       Date:  2019-09-15       Impact factor: 21.405

8.  Sample size considerations for the external validation of a multivariable prognostic model: a resampling study.

Authors:  Gary S Collins; Emmanuel O Ogundimu; Douglas G Altman
Journal:  Stat Med       Date:  2015-11-09       Impact factor: 2.373

9.  Ventilator flow data predict bronchopulmonary dysplasia in extremely premature neonates.

Authors:  Mariann H Bentsen; Trond Markestad; Thomas Halvorsen
Journal:  ERJ Open Res       Date:  2018-03-13
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.