Literature DB >> 31764747

Effects of Selective Exclusion of Patients on Preterm Birth Test Performance.

J Jay Boniface¹, Julja Burchard, George R Saade.

Abstract

The need to reduce the rate of preterm delivery and the recent emergence of technologies that measure hundreds of biological analytes (eg, genomics, transcriptomics, metabolomics, proteomics; collectively referred to as "omics approaches") have led to proliferation of potential diagnostic biomarkers. On review of the literature, a concern must be raised regarding experimental design and data analysis reporting. Specifically, inaccurate performance has often been reported after selective exclusion of patients around the definition boundary of preterm birth. For example, authors may report the performance of a preterm delivery predictor by using patients who delivered early preterm compared with deliveries at 37 weeks of gestation or greater. A key principle that must be maintained during the development of any predictive test is to communicate performance for all patients for whom the test will be applicable clinically (ie, the intended-use population), which for prediction of preterm birth includes patients delivering throughout the spectrum of gestational ages, as this is what is to be predicted, and not known at the time of testing. Using biomarker data collected from the U.S.-based Proteomic Assessment of Preterm Risk clinical trial, we provide examples where the area under the receiver operating characteristic curve for the same test artifactually improves from 0.68 (for preterm delivery at less than 37 weeks of gestation) or 0.76 (for preterm delivery at less than 32 weeks of gestation) to 0.91 when patients who deliver late preterm are excluded. We review this phenomenon in this commentary and offer recommendations for clinicians and investigators going forward. FUNDING SOURCE:: Sera Prognostics.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Year: 2019 PMID： 31764747 PMCID： PMC6882533 DOI： 10.1097/AOG.0000000000003511

Source DB: PubMed Journal: Obstet Gynecol ISSN： 0029-7844 Impact factor: 7.661

Preterm delivery, which refers to delivery before 37 weeks of gestation, affects 15 million neonates born each year and varies from approximately 5–18% of all births across different geographies worldwide.[1] In the United States, it is the leading cause of neonatal death and the second leading cause of death in children before age 5 years. Preterm delivery is also a major source of long-term health consequences, including chronic lung disease, hearing and visual impairments, and neurodevelopmental disabilities such as cerebral palsy. The health-economic effects of preterm delivery in the United States was estimated to be between $26 and $31.5 billion[2,3] and costs continue to rise in most countries.[4] Obstetric care providers routinely evaluate risk of preterm delivery using prior pregnancy history and cervical length, the two strongest traditional predictors of subsequent spontaneous singleton preterm delivery. Unfortunately, calculations based on published data[5,6] reveal that the risk factor of prior spontaneous preterm delivery is present in only 11% of all singleton pregnancies that result in spontaneous preterm delivery. Furthermore, calculations based on data from Hassan et al[7] indicate cervical length, as an independent predictor for spontaneous preterm delivery, only provides an additional attributable risk of 6%. Although racial disparities and risk factors, such as low socioeconomic status, maternal age, and low maternal body mass index (BMI), have been identified,[8,9] up to 50% of all preterm deliveries occur in women without any evident risk factors.[10] Clearly, there is a need for improved prediction of this serious health condition. Interest in assessing the risk of preterm delivery and the development of technologies that measure hundreds of biological analytes (eg, genomics, transcriptomics, metabolomics, proteomics; collectively referred to as “omics approaches”) have greatly increased the discovery of potential predictive biomarkers. However, biomarker predictive performance is sometimes determined after selective exclusion of cases adjacent to the clinical definition boundary for preterm delivery. For example, authors may report the performance of a preterm delivery predictor by using patients who delivered early preterm compared with deliveries at 37 weeks of gestation or greater. Best practices for development of omics tests were formatted into guidelines by the National Academy of Medicine's Committee on the Review of Omics-based tests in 2012.[11] Amongst many elements described in this authoritative publication was the requirement of demonstrating test performance in the intended-use population, which in the context of preterm delivery prediction covers patients destined for delivery at all gestational ages after screening.

CHARACTERISTICS OF PREDICTIVE TESTS

The receiver operating characteristic (ROC) curve formed from the sensitivity and specificity along the continuum of possible test scores provides a good representation of the predictive characteristics of a test (Fig. 1C, F, and I).[12] The area under the ROC curve (AUC) represents the overall predictive ability of the test, with an AUC of 0.5 indicating no predictive ability and an AUC of 1.0 representing perfect predictive ability. When developing such a ROC curve for pregnancy outcomes such as preterm delivery, investigators perform the test on a cohort of women and then follow them until delivery. Once all the enrolled patients deliver, the investigators divide the patients into those with the outcome (case participants) and those without the outcome (noncase participants). All of the enrolled patients who were not lost to follow-up should be categorized as either case or noncase participants. When developing predictive tests for preterm birth earlier than 37 0/7 weeks of gestation, some studies have reported biomarker and algorithm performance after selective exclusion of patients adjacent to the definition boundary of 37 weeks of gestation. For example, some investigators have published[13-18] or presented (Weiner et al. FuturebirthTM—prediction of future preterm birth <33w and preeclampsia/eclampsia <34w by 16w using a novel test in asymptomatic women. Am J Obstet Gynecol 2017;216:S196 [abstract]) test performance by comparing term delivery with preterm deliveries before an early gestational age cutoff (eg, less than 32, 34 or 35 weeks of gestation), or by omitting early term deliveries (eg, 37 and 38 weeks of gestation).[16] Another more subtle form of gapping can also be found in the recent report by Jelliffe-Pawlowski et al,[16] where very early preterm deliveries (less than 32 weeks of gestation) were included at an unnatural equivalent proportion relative to late preterm deliveries (32–36 weeks of gestation). The resulting study distribution of preterm births by gestational age week includes an unnatural flattening in birth rate creating in essence a partial gap. The issue that must be realized, however, is that such limitations of gestational age within the study population (eg, less than 32, 34, or 35 vs greater than 37 weeks of gestation) necessarily exclude a whole group of patients and their outcomes, which artificially inflates the apparent test performance metrics (eg, AUC). To illustrate the effect of gapping on test performance, we used actual biomarker data from a previous study[19] and simulated diagnostic performance with and without gapping.

Fig. 1.

Magnitude of erroneous estimation of test performance as a result of exclusion of patients. Shown are the distributions of gestational age at birth (A, D, and G), distributions of test scores by case–control status (B, E, and H), and corresponding actual, ungapped (C, F) or erroneous, gapped (I) test performance as estimated by area under the curve (AUC). A–C. All patients are included; case group, preterm birth at less than 37 weeks of gestation; control group, term birth at 37 weeks of gestation or greater. D–F. All patients are included; case group, preterm birth at less than 32 weeks of gestation; control group, births at 32 weeks of gestation or greater. G–I. Patients with gestational age at birth 32 weeks of gestation or greater through less than 37 weeks of gestation are excluded; case group, preterm birth at less than 32 weeks of gestation; control group, term births at 37 weeks of gestation or greater.

Boniface. Selective Exclusion in Preterm Birth Test Performance. Obstet Gynecol 2019.

Magnitude of erroneous estimation of test performance as a result of exclusion of patients. Shown are the distributions of gestational age at birth (A, D, and G), distributions of test scores by case–control status (B, E, and H), and corresponding actual, ungapped (C, F) or erroneous, gapped (I) test performance as estimated by area under the curve (AUC). A–C. All patients are included; case group, preterm birth at less than 37 weeks of gestation; control group, term birth at 37 weeks of gestation or greater. D–F. All patients are included; case group, preterm birth at less than 32 weeks of gestation; control group, births at 32 weeks of gestation or greater. G–I. Patients with gestational age at birth 32 weeks of gestation or greater through less than 37 weeks of gestation are excluded; case group, preterm birth at less than 32 weeks of gestation; control group, term births at 37 weeks of gestation or greater.

Boniface. Selective Exclusion in Preterm Birth Test Performance. Obstet Gynecol 2019.

ROLE OF THE FUNDING SOURCE

Sera Prognostic data and analytical support were used to bring attention to this important issue of gapping. Each author participated in conceptualizing the ideas in this manuscript, designing the analysis, drafting the manuscript, editing, and approving the final, submitted version. Ms. Burchard performed the statistical analyses. Each author declares that Good Publication Practice (GPP3) guidelines have been maintained. Specifically, the authors had access to relevant aggregated study data and other information (such as study protocol, analytic plan and report, validated data table, and clinical study report) required to understand and report research findings. The authors take responsibility for the presentation and publication of the research findings, have been fully involved at all stages of publication and presentation development, and are willing to take public responsibility for all aspects of the work. All individuals included as authors and contributors who made substantial intellectual contributions to the research, data analysis, and publication or presentation development are listed appropriately. The role of the sponsor in the design, execution, analysis, reporting, and funding is fully disclosed. The authors' personal interests, financial or nonfinancial, relating to this research and its publication have been disclosed.

METHODS

Simulation of Test Performance

In a real-world unselected general population, the number of births increases with gestational age and peaks at full term (∼40 weeks of gestation).[20] According to current practice and definitions, less than 37 0/7 weeks of gestation vs 37 0/7 weeks of gestation or greater is the dividing point by which preterm vs term births are defined (Fig. 1). Serum biomarker data, based on the ratio of insulin-like growth factor-binding protein 4 and sex-hormone binding globulin serum levels, were derived from analyses done on blood drawn in weeks 19 and 20 of gestation from patients in the U.S.-based Proteomic Assessment of Preterm Risk clinical trial with singleton pregnancies not on progesterone after the 1st trimester and without signs or symptoms of labor at the time of blood draw. Simulations expanded a selected case–control study of 146 patients (41 preterm case participants and 105 control participants matched for distributions of BMI and gestational age at blood draw) by 5 times. The simulation process maintained the characteristics of the original data set with respect to gestational age and biomarker correlation and variability while increasing statistical power to 80% for detection of an AUC difference of 0.1 with P<.05 by DeLong's test,[21] a nonparametric approach to compare AUCs. The effects of an artificial gap on calculated performance metrics for a simulated test was modeled and is illustrated by the separation of the case participants' and control participants' test scores and by the corresponding AUCs. One thousand repetitions were performed, and a representative example was selected showing AUCs within their interquartile ranges for the proper intended-use population and for the same population where an artificial gap is created, with the difference between these AUCs at the median. Prevalence adjusted risk curves were generated from the intended-use population and from an artificially gapped population. A standard calibration plot was used to compare predicted and observed risk of preterm birth.[22] Analyses were performed in R 3.5.1 using the pROC package for AUC and the givitiR package for calibration plots.

Modeling Results

In the example data, the test demonstrates moderate performance using the current practice definition of less than 37 0/7 weeks of gestation vs 37 0/7 weeks of gestation or greater (Fig. 1B and C) and improved performance by lowering the boundary of case and noncase participants (less than 32 0/7 weeks of gestation vs 32 0/7 weeks of gestation or greater) and performing the analysis correctly without exclusion of patients (Fig. 1E and F). To illustrate the effect of gapping for a test intended to predict preterm delivery before 32 weeks of gestation, we then examined apparent performance with omission of births between 32 and 37 weeks of gestation (Fig. 1H and I). The omitted patients (Fig. 1G; hashed line) comprise approximately 8% of the total population and more importantly nearly 84% of all preterm births.[20] As illustrated in Figure 1H selective exclusion of patients widens the separation of case participants' and control participants' test scores. This results in an artifactual improvement in AUC to 0.91 (95% CI 0.85–0.97) for the gapped population compared with a correct AUC of 0.68 (95% CI 0.63–0.72) for preterm delivery at less than 37 weeks of gestation and a correct AUC of 0.76 (95% CI 0.68–0.84) for preterm delivery at less than 32 weeks of gestation in the proper intended-use population (Fig. 1C, F, and I). The difference in AUC in Fig. 1C compared with 1F does not show significance (DeLong's test, P=.065); differences in AUC for Fig. 1I compared with 1C or 1F are significant (DeLong's test, P<.001 and P=.002, respectively). The artifactual increase of 0.23 in AUC on gapping shown in Figure 1 is consistent with changes seen across 1,000 simulations (median 0.22, interquartile range 0.20–0.25). The effect on test performance can also be visualized using a calibration curve constructed from predicted and actual preterm delivery risk.[22] In such an analysis, the test is considered accurate when the predicted risk falls on the diagonal. On the other hand, when predicted risk falls above or below the diagonal the test underpredicts or overpredicts risk, respectively. Actual risk of preterm birth at less than 32 or less than 37 weeks of gestation is quite similar to the predicted risk when prediction is based on test scores of case and noncase patients representing the full intended-use population (Fig. 2A and B). However, predictions based on patients with an artificial gap in gestational age between case and noncase participants greatly underestimate the risk of preterm birth at less than 37 weeks of gestational age while overestimating by several fold the risk of preterm birth at less than 32 weeks of gestation at high test scores (Fig. 2C and D).

Fig. 2.

Magnitude of agreement between predicted and observed risk as a result of exclusion of patients. Shown are predicted vs observed risks of preterm delivery when risks are calculated from an ungapped analysis (A, B) or gapped analysis (C, D), applied to a full intended-use population. A. Risk of preterm delivery at less than 37 weeks of gestation when all patients are included in test development. B. Risk of preterm delivery at less than 32 weeks of gestation when all patients are included in test development. C. Risk of preterm delivery at less than 37 weeks of gestation when patients with gestational age at birth 32 weeks of gestation or greater through less than 37 weeks of gestation are excluded in test development. D. Risk of preterm delivery at less than 32 weeks of gestation when patients with gestational age at birth 32 weeks of gestation or greater through less than 37 weeks of gestation are excluded in test development. Red diagonal lines represent perfect calibration of risk. The 80% and 95% CIs of the relationship between predicted and observed risk are represented by the width of the light gray and dark grey shaded areas, respectively.

Boniface. Selective Exclusion in Preterm Birth Test Performance. Obstet Gynecol 2019.

Magnitude of agreement between predicted and observed risk as a result of exclusion of patients. Shown are predicted vs observed risks of preterm delivery when risks are calculated from an ungapped analysis (A, B) or gapped analysis (C, D), applied to a full intended-use population. A. Risk of preterm delivery at less than 37 weeks of gestation when all patients are included in test development. B. Risk of preterm delivery at less than 32 weeks of gestation when all patients are included in test development. C. Risk of preterm delivery at less than 37 weeks of gestation when patients with gestational age at birth 32 weeks of gestation or greater through less than 37 weeks of gestation are excluded in test development. D. Risk of preterm delivery at less than 32 weeks of gestation when patients with gestational age at birth 32 weeks of gestation or greater through less than 37 weeks of gestation are excluded in test development. Red diagonal lines represent perfect calibration of risk. The 80% and 95% CIs of the relationship between predicted and observed risk are represented by the width of the light gray and dark grey shaded areas, respectively.

Boniface. Selective Exclusion in Preterm Birth Test Performance. Obstet Gynecol 2019.

CLINICAL IMPLICATIONS

Recent progress in omics has provided exciting and novel opportunities for the development of innovative clinical tests. Amid this justified excitement, vigilance must be maintained by the scientific and clinical community in reporting and reviewing the conclusions of studies promoting biomarker performances. In this article, we address a concern that is critical to the validity of reports on biomarker performance for preterm delivery: the insertion of a gestational age gap in the study population does not allow for accurate estimates of predictive performance. The simulations based on actual data illustrate how “gapping” of the study population results in artifactual test performance for preterm birth prediction. Building estimates of AUC, sensitivity, specificity and predictive values in the context of selective exclusion of certain patients is inappropriate, because a prediction cannot be built for an intended-use population when it does not account for all such patients who will, in fact, exist in the population of patients to be tested. The performance of a test for the most severe preterm births should be determined by lowering the case and on-case boundary without omission of patients, as exemplified in Figures 1D, E, and F and 2B. The errors associated with “gapping” are not trivial and can have significant implications in both clinical practice and research. As exemplified here, gapped analyses may lead to overestimation of the test predictive abilities, which can lead to introduction of an ineffective test, overdiagnosis and unnecessary treatments, ultimately increasing cost and harm. Gapped analyses may be appropriate as proof of concept or for preliminary evidence to support further research, but such reports cannot imply clinical test performance nor be described as “clinical validation.” When evaluating preterm delivery prediction, it is important to clarify what is meant by a control participant. In such an analysis, control participant does not refer to a patient who has a normal pregnancy. A control participant is a patient who is not a case participant, that is, does not have the outcome being predicted. For the same reasons outlined above regarding gestational age gapping, it would be inappropriate to exclude patients who had a pregnancy complication (eg, preeclampsia) from the control group (or noncase group) when developing tests to predict preterm birth. To prevent any confusion, we suggest using case participant vs noncase participant to refer to those who have the outcome to be predicted and those who do not, rather than case participant and control participant. When clinicians evaluate reported characteristics of any test to predict preterm delivery, we suggest following the checklist provided in Table 1. Although we focused on gestational age gapping in preterm delivery prediction, these principles apply equally to studies of other adverse outcomes in pregnancy that are influenced by gestational age, such as preeclampsia (early onset vs late onset), intrauterine growth disorders, and other complex maternal conditions that would mandate preterm delivery.

Table 1.

Study Design and Analysis Considerations for the Test Characteristics to Be Clinically Applicable

20 in total

1. Proteomic identification of serum peptides predicting subsequent spontaneous preterm birth.

Authors: M Sean Esplin; Karen Merrell; Robert Goldenberg; Yinglei Lai; Jay D Iams; Brian Mercer; Catherine Y Spong; Menachem Miodovnik; Hygriv N Simhan; Peter van Dorsten; Mitchell Dombrowski
Journal: Am J Obstet Gynecol Date: 2010-11-11 Impact factor: 8.661

2. Recurrence risk for preterm delivery.

Authors: Julie McManemy; Erinn Cooke; Erol Amon; Terry Leet
Journal: Am J Obstet Gynecol Date: 2007-06 Impact factor: 8.661

3. Circulating serum-derived microparticles provide novel proteomic biomarkers of spontaneous preterm birth.

Authors: Alan M Ezrin; Brian Brohman; Jackie Willmot; Sarah Baxter; Keith Moore; Mike Luther; Michael R Fannon; Baha Sibai
Journal: Am J Perinatol Date: 2015-03-31 Impact factor: 1.862

4. The meaning and use of the area under a receiver operating characteristic (ROC) curve.

Authors: J A Hanley; B J McNeil
Journal: Radiology Date: 1982-04 Impact factor: 11.105

5. Evaluation of proteomic biomarkers associated with circulating microparticles as an effective means to stratify the risk of spontaneous preterm birth.

Authors: David E Cantonwine; Zhen Zhang; Kevin Rosenblatt; Kevin S Goudy; Robert C Doss; Alan M Ezrin; Gail Page; Brian Brohman; Thomas F McElrath
Journal: Am J Obstet Gynecol Date: 2016-02-11 Impact factor: 8.661

6. Circulating microparticle proteins obtained in the late first trimester predict spontaneous preterm birth at less than 35 weeks' gestation: a panel validation with specific characterization by parity.

Authors: Thomas F McElrath; David E Cantonwine; Arun Jeyabalan; Robert C Doss; Gail Page; James M Roberts; Brian Brohman; Zhen Zhang; Kevin P Rosenblatt
Journal: Am J Obstet Gynecol Date: 2019-01-25 Impact factor: 8.661

7. Estimated effect of 17 alpha-hydroxyprogesterone caproate on preterm birth in the United States.

Authors: Joann R Petrini; William M Callaghan; Mark Klebanoff; Nancy S Green; Eve M Lackritz; Jennifer L Howse; Richard H Schwarz; Karla Damus
Journal: Obstet Gynecol Date: 2005-02 Impact factor: 7.661

8. Vaginal progesterone reduces the rate of preterm birth in women with a sonographic short cervix: a multicenter, randomized, double-blind, placebo-controlled trial.

Authors: S S Hassan; R Romero; D Vidyadhari; S Fusey; J K Baxter; M Khandelwal; J Vijayaraghavan; Y Trivedi; P Soma-Pillay; P Sambarey; A Dayal; V Potapov; J O'Brien; V Astakhov; O Yuzko; W Kinzler; B Dattel; H Sehdev; L Mazheika; D Manchulenko; M T Gervasi; L Sullivan; A Conde-Agudelo; J A Phillips; G W Creasy
Journal: Ultrasound Obstet Gynecol Date: 2011-06-15 Impact factor: 7.299

9. Prediction of preterm birth with and without preeclampsia using mid-pregnancy immune and growth-related molecular factors and maternal characteristics.

Authors: Laura L Jelliffe-Pawlowski; Larry Rand; Bruce Bedell; Rebecca J Baer; Scott P Oltman; Mary E Norton; Gary M Shaw; David K Stevenson; Jeffrey C Murray; Kelli K Ryckman
Journal: J Perinatol Date: 2018-05-24 Impact factor: 3.225

10. Noninvasive blood tests for fetal development predict gestational age and preterm delivery.

Authors: Thuy T M Ngo; Mira N Moufarrej; Marie-Louise H Rasmussen; Joan Camunas-Soler; Wenying Pan; Jennifer Okamoto; Norma F Neff; Keli Liu; Ronald J Wong; Katheryne Downes; Robert Tibshirani; Gary M Shaw; Line Skotte; David K Stevenson; Joseph R Biggio; Michal A Elovitz; Mads Melbye; Stephen R Quake
Journal: Science Date: 2018-06-08 Impact factor: 47.728

1 in total

1. Cost-Effectiveness of a Proteomic Test for Preterm Birth Prediction.

Authors: Michael Grabner; Julja Burchard; Chi Nguyen; Haechung Chung; Nilesh Gangan; J Jay Boniface; John A F Zupancic; Eric Stanek
Journal: Clinicoecon Outcomes Res Date: 2021-09-14

1 in total