Literature DB >> 27441291

Screening for post 32-week preterm birth risk: how helpful is routine perinatal data collection?

Wei Luo¹, Emily Y-S Huning², Truyen Tran³, Dinh Phung¹, Svetha Venkatesh¹.

Abstract

BACKGROUND: Preterm birth is a clinical event significant but difficult to predict. Biomarkers such as fetal fibronectin and cervical length are effective, but the often are used only for women with clinically suspected preterm risk. It is unknown whether routinely collected data can be used in early pregnancy to stratify preterm birth risk by identifying asymptomatic women. This paper tries to determine the value of the Victorian Perinatal Data Collection (VPDC) dataset in predicting preterm birth and screening for invasive tests.
METHODS: De-identified VPDC report data from 2009 to 2013 were extracted for patients from Barwon Health in Victoria. Logistic regression models with elastic-net regularization were fitted to predict 37-week preterm, with the VPDC antenatal variables as predictors. The models were also extended with two additional variables not routinely noted in the VPDC: previous preterm birth and partner smoking status, testing the hypothesis that these two factors add prediction accuracy. Prediction performance was evaluated using a number of metrics, including Brier scores, Nagelkerke's R(2), c statistic.
RESULTS: Although the predictive model utilising VPDC data had a low overall prediction performance, it had a reasonable discrimination (c statistic 0.646 [95% CI: 0.596-0.697] for 37-week preterm) and good calibration (goodness-of-fit p = 0.61). On a decision threshold of 0.2, a Positive Predictive Value (PPV) of 0.333 and a negative predictive value (NPV) of 0.941 were achieved. Data on previous preterm and partner smoking did not significantly improve prediction.
CONCLUSIONS: For multiparous women, the routine data contains information comparable to some purposely-collected data for predicting preterm risk. But for nulliparous women, the routine data contains insufficient data related to antenatal complications.

Entities: Disease Gene Species

Keywords: Medicine

Year: 2016 PMID： 27441291 PMCID： PMC4946290 DOI： 10.1016/j.heliyon.2016.e00119

Source DB: PubMed Journal: Heliyon ISSN： 2405-8440

Background

Preterm birth is a significant clinical issue, both in Australia and globally [1]. Specialised management is required both to identify patients at risk and to prolong gestation where preterm birth risk is present. These measures are important to reduce the incidence of the significant adverse outcomes that can result from preterm birth. Workforce shortages, an increased birth rate and organisational and budgetary pressures have driven changes in public antenatal care such that women who are at perceived low risk of complications are directed away from specialist-led secondary or tertiary services to primary care models led by midwives or general practitioners [2]. The safety and effectiveness of these models depends on reliable and timely identification of women with high-risk pregnancies in order to deliver the right level of care to the right patient at the right time. Reliable methods of predicting preterm risk from early pregnancy are lacking, but would help to achieve better risk stratification [3]. For threatened preterm labour, clinical tests exist for effectively identifying the risk of preterm birth. In particular, fetal fibronectin and cervical length have been shown to have a good NPV in regards to preterm labour or imminent preterm birth [4, 5]. The recent SCOPE study [6] shows that with carefully collected data on clinical risk factors (including uterine artery Doppler measurements and cervical length), reasonable prediction accuracies (c-statics 0.69 for intact-membrane preterms and 0.79 for spontaneous-rupture-of-membrane preterms) can be achieved. Other prospective studies such as the Raine study [7] also collected high quality data. However in Australia, the data on risk factors like those in SCOPE and Raine are not always available. In particular, some screening tests are more likely to be performed in patients where there is a clinical suspicion of likely preterm birth or in areas with sufficient ultrasound resources and expertise [8]. For many regional or remote areas, the minimum data items required in routine reports to the government have the best quality and are often the only available data for risk prediction. For asymptomatic women, a reliable assessment of risk based on routinely collected information would be helpful in deciding the need for further screening tests. In practical terms, this would be most useful if able to be collected in early pregnancy so as to determine the most appropriate initial model of care. Minimum data collection is in place at the local and national level across Australia. The Commonwealth government collects information through the National Perinatal Data Collection (NPDC) with data specifications listed in the Perinatal National Minimum Data Set (NMDS) [9]. The national data collection is compiled from state-level data. In Victoria, the Victorian Perinatal Data Collection (VPDC) structure has been in place since 2009. The VPDC, with a format similar to other state-level data collections, contains a broad range of information about antenatal care. As similar data are routinely collected across all hospitals in Australia, a prediction model based on such data could potentially serve as a risk-screening tool for asymptomatic women. In this study, we evaluate the value of the VPDC data for predicting preterm birth. As the VPDC is not designed specifically for predicting prematurity, it does not include certain significant risk factors for preterm birth, such as previous preterm birth and partner smoking status. We hypothesized that adding these often readily available variables would improve prediction.

Methods

Ethical statement

Ethics approval was obtained from Barwon Health Human Research Ethics Committee (approval number 12/83), with whom Deakin University Human Research Ethics Committee has reciprocal ethics authorization. All study procedures were performed in accordance with Australia National Statement on Ethical Conduct in Human Research (2007). Patient consent was deemed unnecessary as this was a retrospective secondary use of data. De-identified perinatal data from January 2009 to June 2013 were extracted from Barwon Health. The data contains both twin and singleton pregnancies; Multiples were not excluded from the analysis.

Outcomes: 32–37 weeks preterm

Following the definition used by the AIHW [10], preterm birth was defined as a live birth with a gestational age before 37 weeks (excluding 37 + 0 weeks). In Victoria, babies born before 32 weeks’ gestation are normally cared at Melbourne. Therefore in this study preterm births are defined to be from 32 + 0 weeks to 36 + 6 weeks.

Predictors

VPDC antenatal data covariates listed in Table 1 were to form the predictive models. From this original VPDC data, the following variables can also be derived:

Table 1

The Victorian Perinatal Data Collection variables used for preterm prediction.

Country of birth − mother

Indigenous status − mother

Marital statusMaternal medical conditions − ICD-10-AM code

Mother age

Height − self-reported − mother

Weight − self-reported − mother

Maternal smoking at less than 20 weeks

Gravidity

Total number of previous live births

Total number of previous abortions − spontaneous

Total number of previous abortions − induced

Total number of previous ectopic pregnancies

Total number of previous unknown outcomes of pregnancy

Date of completion of last pregnancy

Outcome of last pregnancyParity

Last birth − caesarean section indicator

Total number of previous caesareans

Plan for vaginal birth after caesarean

Gestational age at first antenatal visit

Was artificial reproductive technology used? (yes/no/unknown)

Birth plurality

Maternal BMI. Time interval between the estimated starting date of current pregnancy and the completion date of last pregnancy. Preterm rate of the mother’s birth country (2010 estimates according to [11]). Presence of the following common keywords in the Maternal medical conditions − free text columns: ‘asthma'; ‘depression'; ‘anaemia'. The Victorian Perinatal Data Collection variables used for preterm prediction. Because patients residential address information (data fields 1. Residential locality, 2. Residential postcode, 3. Residential address) are not generalizable to other hospitals, they were replaced by more generalizable socio-economic indices through postal-area mapping [12]. For the data field Maternal medical conditions − free text, a binary variable was added indicating whether the field is empty. As the data contains only public patients, the data field Admitted patient election status − mother was excluded from the model. Variables not routinely collected early in pregnancy were excluded from the list of putative predictors (See Table 2).

Table 2

The Victorian Perinatal Data Collection variables excluded from preterm prediction.

Number of ultrasounds 10–14 weeks

Number of ultrasounds 15–26 weeks

Maternal smoking at more than or equal to 20 weeks

Indigenous status − baby

Discipline of antenatal care provider

Setting of birth − intended, Setting of birth − actual, Setting of birth − change of intent and Setting of birth − change of intent − reason

Obstetric complications (free text or ICD-10-AM code)

Procedure (ACHI code or text)

Number of ultrasounds at or after 27 weeks

All other variables that are measured immediately before, during and post delivery.

Predictive modelling

A predictive model was constructed using predictors from Table 1; the model was later extended with two additional predictors—previous preterm birth and partner smoking status, both of which were recorded in the hospital perinatal database. The reliability of these two data fields cannot be established. Logistic regression models were fitted with ridge and lasso regularization on coefficients (Model 1 and Model 2 below). The regularization avoids the need to make manual variable selection before fitting the model, as our purpose is to assess the predictive power of potentially all antenatal variables in VPDC. Model 1: preterm_indicator = glmnet_binomial(x_in_Table_1) Model 2: preterm_indicator = glmnet_binomial(x_in_Table_1 + previous_preterm + partner_smoking) For regularized logistic regression, the R package glmnet was used [13]. The R package caret [14] was used to select the tuning parameters and compute results. Categorical variables were converted into dummy binary variables first. Some columns contained missing values, including MotherHeight, MotherWeight, and BMI. The missing values were imputed (for continuous variables) or assigned a separate category (for nominal variables). 25-fold bootstrap was used to select the optimal regularization parameters. From the coefficients of the fitted model (defined by the optimal regularization parameters), the relevance of various data fields were inferred. In particular, 5 variables with the greatest positive coefficients and 5 variables with the largest negative coefficients were identified, due to their influence on increasing or decreasing the predicted probability of preterm.

Validation of the prediction models

All records were divided into a training set and a validation set in a 3 to 1 ratio. The training set was used to fit the predictive models, and the validation set was used to evaluate the predictive performance of the model. Following previously well described strategies [15], the prediction models were evaluated for overall performance, discrimination, and calibration. Overall performance of the validation set was measured using Brier scores (original and scaled) and Nagelkerke’s R2. Discrimination was measured through c statistic and discrimination slope. Calibration was measured through calibration-in-the-large, calibration slope, and Hosmer–Lemeshow tests. Description of these performance measures can be found in existing literature [15].

Results

Data regarding a total of 9573 births over 3.5 years were extracted from the data collection. Among them, 719 were preterm births; making the rate of preterm birth 7.5%, slightly lower than the 8.2% reported national average for 2009 [16]. As a regional health provider, Barwon Health normally transfer babies born before 32 weeks to the state-level hospitals in Melbourne. Therefore the data consists of mostly births after 32 weeks. Although VPDC requires mandatory data collection for most of the items in Table 1, answers such as “Question unable to be asked” or “Not stated/inadequately described” are still allowed. Records with missing data were excluded from the analysis, resulting in 8100 births, including 93 twin pregnancies (See Table 3).

Table 3

Cohort characteristics.

		Number of patients (percentage)	Percentage of preterm
			< 34 weeks	< 37 weeks
Age distribution
	15–20	468 (5.8%)	2.6%	7.3%
	20–30	4009 (49.5%)	1.4%	6.4%
	30–40	3417 (42.2%)	1.6%	6.7%
	40–55	200 (2.5%)	1.5%	7.5%
Indigenous status
	Indigenous Australians	104 (1.3%)	1.9%	6.7%
Nonindigenous	7996 (98.7)	1.6%	6.6%
Plurality
	1	8007 (98.9%)	1.4%	6.0%
>=2	93 (1.1%)	20.4%	58.1%
BMI
	<18.5	176 (2.2%)	1.1%	9.1%
18.5–25	3759 (46.4%)	1.5%	6.5%
25–30	2239 (27.6%)	1.8%	5.9%
>30	1926 (23.8%)	1.7%	7.4%
Care Provider
	Obstetrician	1870 (23%)	3.6%	12.8%
	Midwife	5405 (67%)	0.8%	4.6%
	GP	807 (10%)	1.5%	5.3%

The records were divided into a training set of 6075 births (402 cases of 37-week preterm and 89 cases of 34-week preterm) and a validation set of 2025 births (134 cases of 37-week preterm and 40 cases of 34-week preterm). For 37-week preterm, the predictive performance of the two logistic regression models is shown in Table 4. The model based on the VPDC data alone has c-statistics 0.646 (95% CI: 0.596–0.697).

Table 4

Prediction performance of two models for 37-week preterm.

Performance measure	VPDC alone	With information on previous preterm and partner smoking
Overall prediction performance
Brier	0.059	0.060
Scaled Brier	4.1%	3.6%
Nagelkerke’s R²	6.7%	6.4%
Discrimination
c statistics [95% CI]	0.646 [0.596–0.697]	0.645 [0.595–0.697]
Discrimination slope	0.060	0.060
Calibration
Calibration in the large	-0.027	-0.020
Calibration slop	0.804	0.753
Hosmer–Lemeshow test	Chi-square 7.3, p = 0.61	Chi-square 12.0, p = 0.21

By coefficients of the fitted model, the top 5 variables that increased the predicted probability for 37-week preterm were: plurality (coefficient 0.17), mental and behavioural disorders due to use of cannabinoids (coefficient 0.14), previous preterm birth (coefficient 0.12), acute hepatitis C (coefficient 0.05), stillbirth in last pregnancy (coefficient 0.04). Cannabinoids use and acute hepatitis C were encoded as ICD-10 codes (International Statistical Classification of Diseases and Related Health Problems, Tenth Revision, Australian Modification) in the “Maternal medical conditions” data field. The top 5 variables that decreased the predicted probability for 37-week-preterm were: singular birth (coefficient −0.13), absence of maternal medical conditions (coefficient −0.08), previous term birth (coefficient −0.05), living in areas with better education/occupation index (coefficient −0.03), and non-smoking partner (coefficient −0.02). These predictive factors are consistent with known hypotheses, indicating reasonable quality of the VPDC data. Among the top 5 predictive variables for 37-week preterm, two (previous preterm birth and stillbirth in last pregnancy) are relevant to only multiparous women. This implies that the model has different predictive performance for nulliparous women. For the model based on the VPDC data alone, the distribution of the predicted risk of women in the validation set is shown in Fig. 1.

Fig. 1

Boxplots of the predicted preterm probabilities of the validation set grouped by the true outcome.

The predicted probability provides a potential basis for risk screening. If screening decisions are to be made based on different levels of predicted probability of 37-week preterm, the prediction accuracies on several thresholds are shown in Table 5. The NPV confirms the intrinsic difficulty in preterm birth prediction and that the routine data collection does not capture all risk factors for preterm births. In comparison with binary decision, a more appropriate use of the prediction is to alert elevated risk. In particular, it may be used to identify high risk patients among patients already under obstetrician care (normally for having identified risk factors). These two-fold high risk patients identified could be considered for further invasive testing. Of the 2025 births in the validation set, 458 patients (22.6%) were under obstetrician care.

Table 5

Prediction accuracy for 37-week preterm at three decision thresholds, assuming personals is considered “high risk” when the predicted probability exceeds the set threshold.

Decision threshold based on predicted probability pˆ	Sensitivity	Specificity	PPV	NPV
0.2	12.7%	98.2%	33.3%	94.1%
0.16	16.4%	96.9%	27.8%	94.2%
0.13	20.1%	95.3%	23.4%	94.4%

No added prediction accuracy with previous preterm birth and partner smoking status

Two models have similar prediction performance. The model with previous-preterm and partner-smoking information has slightly highly C-statistics (0.614 vs 0.605), but a slightly poorer calibration slope (0.687 vs 0.726). The integrated discriminative improvement (IDI [17]) was 0.0004 (standard error 0.002) and the relative IDI was 0.7%, indicating little improvement with the additional variables. The receiver operating characteristic curve used to evaluate the validation set is shown in Fig. 2. The curves show that including the two predetermined additional variables does not improve the model’s prediction performance.

Fig. 2

Receiver operating characteristic (ROC) curves for the two logistic regression models on the validation set. Additional data on previous preterm birth and partner smoking did not generate model improvement.

Discussion

We found that the VPDC data serves as a reasonable basis for development of an alerting tool for potential risk of preterm birth at early pregnancy. Although the data is not rich enough to capture most risk factors, the predicted preterm probability provides a numerical value that could be used to identify the cases at highest risk of preterm birth. For example, among the 17 patients with the highest predicted risk in the validation cohort (), 8 of them actually had preterm births. Among the top 26 patients (), half of them had preterm births. The VPDC has a data format that complies with the national perinatal minimum data collection requirements in Australia, and all Australian states or territories have equivalent data collection processes for health surveillance. Therefore our results are likely to be generalizable at the national level. However, our dataset has a clear under-representation of indigenous mothers (1.4%). As maternal indigenous status is known to have a strong association with preterm birth, the prediction performance of this model on another data set with a higher indigenous population may be different [18]. Two previous studies have reported the performance of models for predicting preterm birth. Nicholson et al. reported a simple model with only three variables (prior preterm delivery, substance abuse and initiation of care in the third trimester), which achieved specificity of 98% and an NPV of 73% [19]. Using the VPDC data in a logistic regression model, specificity of 98.2% and an NPV of 94.1% can be achieved if a decision threshold of is applied. Mercer et al. [20] used more variables, including medical history, results of testing and anthropomorphic and cervical examinations which are not available in the VPDC. For multiparous women, their model had a sensitive of 18.2% and a PPV of 33.3%. Our model had a sensitivity of 12.7% and a PPV of 33.3% at decision thresholds . These comparisons show that a reasonable predictive model can be obtained with the VPDC data alone, even early in pregnancy, when direct clinical measurements are not available (e.g., Many believe that cervical length screening is done less often among patients from lower socio-economic background.). Among all VPDC data items, cannabis use was identified as predictive of preterm births. This is consistent with the findings in a recent study [6]. Our data confirms that prediction of nulliparous preterm is more difficult than prediction of multiparous preterm. The prediction accuracy was lower than the SCOPE models that incorporate clinical risk factors such as cervical length and uterine artery Doppler ultrasound measurements [6]. The low prediction accuracy can also be explained by the paucity of information on nulliparous mothers. Among the 23 antenatal variables measured in VPDC, as many as 10 variables are relevant to mostly multiparous mothers (e.g., total number of previous live births). For routine antenatal data to be clinically useful for predicting nulliparous preterms, more information pertinent to nulliparous mothers need to be collected. The data contained 93 twin cases. Multiples pregnancies are often considered to have higher preterm risk. But because of the small number of cases (compared to the total number of 8100 births), they are not likely to introduce significant bias in the results. Although previous preterm birth is believed to highly correlated with current preterm birth risk, its effect had been masked by other factors in the model. Although the masking effect is quite common when a large number of highly correlated factors are used in a lasso model, it is still worth investigating whether the same effect would persist in a larger data collection. As a minimum reporting data collection, VPDC has many limitations. One limitation is the inability to distinguish between spontaneous, post Preterm premature rupture of membranes and iatrogenic preterm birth. Data better distinguishing the different aetiology of preterm births would likely improve the risk prediction algorithm. Also due the limitation of available data, this study focuses on preterm births between 32 weeks and 37 weeks. Although these cases have relatively low severity compared to sub-32 week preterms, they represent majority of the preterms. As many researchers recognize, even late preterm births are associated with short-term and long-term health and educational disadvantages [21]. This study is limited by its retrospective and observational nature. Also the article addresses only the technical aspect of using routine data for potential screening. An ideal screening program involves broader considerations, such as those specified in the Wilson and Jungner criteria. In 2010, more than 1 in 10 of all babies worldwide were born prematurely [22]. The significant morbidity and mortality associated with preterm birth have made prevention of preterm birth a global priority for medical research and innovation. Reducing the rate of preterm birth will require accurate identification of those at highest risk, combined with early and effective treatment to prolong gestation. This study proposes a model that may readily be developed with existing data to identify those at high risk but is based on retrospective data from a single site. External validation with data from other hospitals is desirable. Because of the uniform data format, this study can be reproduced and its results can be validated in a straight-forward manner.

Conclusions

A typical antenatal data collection contains broad socio-demographic and adequate clinical information and forms the basis of reasonable risk stratification for preterm birth. Risk prediction using such routinely collected data is as good as prediction based on some purposely collected data. The prediction may provide the earliest objective risk measure at the first antenatal visit.

Declarations

Author contribution statement

Wei Luo: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper. Emily Huning: Conceived and designed the experiments; Analyzed and interpreted the data; Wrote the paper. Dinh Phung: Performed the experiments. Truyen Tran: Performed the experiments; Wrote the paper. Svetha Venkatesh: Conceived and designed the experiments; Wrote the paper.

Competing interest statement

The authors declare no conflict of interest.

Funding statement

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Additional information

No additional information is available for this paper.

15 in total

1. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond.

Authors: Michael J Pencina; Ralph B D'Agostino; Ralph B D'Agostino; Ramachandran S Vasan
Journal: Stat Med Date: 2008-01-30 Impact factor: 2.373

Review 2. Sonographic measurement of cervical length as a predictor of preterm delivery: a systematic review.

Authors: Joana Barros-Silva; Ana Catarina Pedrosa; Alexandra Matias
Journal: J Perinat Med Date: 2014-05 Impact factor: 1.901

3. Preterm delivery in patients admitted with preterm labor: a prediction study.

Authors: W Nicholson; M Croughan-Minihane; S Posner; A E Washington; S K Kilpatrick
Journal: J Matern Fetal Med Date: 2001-04

4. The preterm prediction study: a clinical risk assessment system.

Authors: B M Mercer; R L Goldenberg; A Das; A H Moawad; J D Iams; P J Meis; R L Copper; F Johnson; E Thom; D McNellis; M Miodovnik; M K Menard; S N Caritis; G R Thurnau; S F Bottoms; J Roberts
Journal: Am J Obstet Gynecol Date: 1996-06 Impact factor: 8.661

5. Assessing the performance of prediction models: a framework for traditional and novel measures.

Authors: Ewout W Steyerberg; Andrew J Vickers; Nancy R Cook; Thomas Gerds; Mithat Gonen; Nancy Obuchowski; Michael J Pencina; Michael W Kattan
Journal: Epidemiology Date: 2010-01 Impact factor: 4.822

6. Changing models of public antenatal care in Australia: is current practice meeting the needs of vulnerable populations?

Authors: Stephanie J Brown; Georgina A Sutherland; Jane M Gunn; Jane S Yelland
Journal: Midwifery Date: 2013-10-28 Impact factor: 2.372

7. Spontaneous preterm birth of liveborn infants in women at low risk in Australia over 10 years: a population-based study.

Authors: S K Tracy; M B Tracy; J Dean; P Laws; E Sullivan
Journal: BJOG Date: 2007-06 Impact factor: 6.531

8. Risk factors for preterm birth in an international prospective cohort of nulliparous women.

Authors: Gustaaf Albert Dekker; Shalem Y Lee; Robyn A North; Lesley M McCowan; Nigel A B Simpson; Claire T Roberts
Journal: PLoS One Date: 2012-07-16 Impact factor: 3.240

9. Class prediction for high-dimensional class-imbalanced data.

Authors: Rok Blagus; Lara Lusa
Journal: BMC Bioinformatics Date: 2010-10-20 Impact factor: 3.169

Review 10. Strategies to prevent preterm birth.

Authors: John P Newnham; Jan E Dickinson; Roger J Hart; Craig E Pennell; Catherine A Arrese; Jeffrey A Keelan
Journal: Front Immunol Date: 2014-11-19 Impact factor: 7.561

1 in total

1. Predictions of Preterm Birth from Early Pregnancy Characteristics: Born in Guangzhou Cohort Study.

Authors: Jian-Rong He; Rema Ramakrishnan; Yu-Mian Lai; Wei-Dong Li; Xuan Zhao; Yan Hu; Nian-Nian Chen; Fang Hu; Jin-Hua Lu; Xue-Ling Wei; Ming-Yang Yuan; Song-Ying Shen; Lan Qiu; Qiao-Zhu Chen; Cui-Yue Hu; Kar Keung Cheng; Ben Willem J Mol; Hui-Min Xia; Xiu Qiu
Journal: J Clin Med Date: 2018-07-27 Impact factor: 4.241

1 in total