Literature DB >> 27583924

Ultrasound as a primary screening tool for detecting low birthweight newborns: A meta-analysis.

Abstract

BACKGROUND: As low birthweight (i.e., birthweight < 2500 g) is a major determinant of neonatal mortality and morbidity, the pre-delivery detection of low birthweight is clinically advantageous. This study was performed to determine whether ultrasound is suitable for use in primary screening to detect low birthweight newborns.
METHODS: The primary outcomes included sensitivity, specificity, and positive and negative likelihood ratios of ultrasound detection of low birthweight newborns. Ten databases, including PubMed, were searched. All English language studies that provided true- and false-positive and true- and false-negative results regarding the pre-delivery ultrasound detection of low birthweight newborns were eligible for inclusion in the analysis. Study quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies. Bivariate diagnostic meta-analysis was performed and hierarchical summary receiver operating characteristic curves were constructed.
RESULTS: Studies of relatively good quality were included in the analysis to evaluate crown-rump length (n = 12); femur length (n = 5); formulas of Campbell, Hadlock, and Shepard (n = 9); and uterine artery blood flow (n = 7). All showed low sensitivity (=0.24-0.58) regardless of specificity (=0.60-0.96). The formulas of Campbell, Hadlock, and Shepard were usable for a confirmation strategy only (positive and negative likelihood ratios = 14.8 and 0.44, respectively), but crown-rump or femur length, and uterine artery blood flow were not usable for an exclusion or confirmation strategy (positive and negative likelihood ratios = 1.4-2.8 and 0.71-0.85, respectively).
CONCLUSIONS: Primary screening does not have to confirm low birthweight, but should almost always categorize low birthweight as a positive result and exclude normal birthweight. Therefore, ultrasound is not suitable as a primary screening tool to detect low birthweight newborns.

Entities: Disease Gene Species

Mesh：

Year: 2016 PMID： 27583924 PMCID： PMC5008608 DOI： 10.1097/MD.0000000000004750

Source DB: PubMed Journal: Medicine (Baltimore) ISSN： 0025-7974 Impact factor: 1.889

Introduction

Low birthweight (i.e., birthweight <2500 g) is one of the major determinants of neonatal mortality and morbidity.[ Therefore, early detection of low birthweight newborns may be necessary to ensure the provision of immediate and appropriate care. Maternal anthropometric measurements and symphysis–fundal height were shown not to be useful for the detection of low birthweight newborns.[ However, neonatal chest and arm circumferences were demonstrated to be useful parameters for this purpose, especially in developing countries.[ However, low birthweight newborns should ideally be detected before rather than after delivery. Recently, ultrasound has become more widely available at hospitals in developing and developed countries. Use of ultrasound is not limited to confirmation of pregnancy but is also used to check for multiple pregnancy, determination of the baby's sex, screening for intrauterine deaths and congenital abnormalities, monitoring of fetal growth and position, prevention of maternal complications, and estimation of gestational age and delivery date. Therefore, ultrasound may be useful for detecting low birthweight newborns prior to delivery. Here, bivariate diagnostic meta-analysis was performed and hierarchical summary receiver operating characteristic (HSROC) curves were constructed[ to determine whether ultrasound is suitable as a primary screening tool for detecting low birthweight newborns prior to delivery.

Methods

Primary outcomes and inclusion criteria

The primary outcomes were sensitivity and specificity, positive and negative likelihood ratios (LRs), diagnostic odds ratio (DOR), and area under the curve (AUC) of ultrasound detection of low birthweight newborns. The inclusion criteria were all the English language studies that provided true- and false-positive and true- and false-negative results regarding predelivery ultrasound detection of low birthweight newborns. Studies, in which missing result(s), if any, could be calculated from other data (e.g., number of subjects, prevalence of low birthweight newborns, and sensitivity and specificity) were included in the analysis. The objectives of the studies included in the analysis were not limited to evaluation of the diagnostic performance of ultrasound for detecting low birthweight newborns.

Search strategies, study selection, and data extraction

PubMed/MEDLINE (i.e., Medical Literature Analysis and Retrieval System Online) was searched (April 13, 2016) using search terms generated by adding the key words to Falck-Ytter filter,[ as described in the Online Supplementary Methods. There was no limitation of publication date. Articles that were determined to be unrelated to the purpose of the study by scanning the titles and abstracts were excluded. The remaining articles were subjected to full-text retrieval. Articles that were determined to be unrelated by retrieving the full texts were also excluded. Those that remained were potentially eligible articles. Articles that were reviews or that did not provide all of the true- and false-positive and true- and false-negative results by full-text retrieval were also excluded. The remaining articles were finally eligible for inclusion in the analysis. Attempts were made to identify additional eligible articles by investigating: the PubMed-related citations shown by clicking “See all…” on the right sides of the PubMed screens displaying potentially eligible articles and the bibliographic references of the potentially eligible articles. Nine other databases were searched, CINAHL (i.e., Cumulative Index of Nursing and Allied Health Literature), PsycInfo (i.e., Psychology Information), Wiley Online Library, ProQuest Central (e.g., ProQuest Health and Medical Complete and ProQuest Nursing & Allied Health Source), ProQuest Dissertations & Theses Global, the entire Cochrane Library (e.g., Cochrane Central Register of Controlled Trials), Web of Knowledge, Google Scholar, and Scopus. The selection process was repeated periodically. The data extracted were as follows: the first author's name; publication year; country; ultrasound methods used to detect low birthweight newborns; cutoff points; true- and false-positive and true- and false-negative results; Quality Assessment of Diagnostic Accuracy Studies (QUADAS) score (see “Study quality assessment” below);[ and the presence or absence of 1 of the 3 major sources of bias in a study included in diagnostic meta-analysis, that is, disregard for the use of the same reference test regardless of the result of the index test, cohort study rather than case–control study, and prospective study rather than retrospective study.[

Study quality assessment

The QUADAS, which is a tool consisting of 14 question items devised to assess quality of studies included in diagnostic meta-analysis,[ was used. Study quality was assessed 5 times, and the most frequent responses were considered to be the most appropriate. The QUADAS score was defined as the number of “yes” responses to the question items. For statistical analysis, a value of “1” was assigned to a “yes” response to each question item, and a value of “0” was assigned to a “no” or “unclear” response.

Statistical analysis

Stata/MP (i.e., multiple processor) 13.1 (Stata Corp LP [i.e., limited partnership], College Station, TX) and R version 3.0.1 (The R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses.[ Attempts were made to detect outlier(s) by model checking[ using the spike plot of Cook distance for each study and scatter plot of the standardized residuals of healthy (x-axis) and diseased (y-axis) populations for each study. The cutoff point for Cook distance was calculated as 4 times the number of parameters divided by the number of studies. The cutoff point for standardized residual was the standardized 2-level residual. Studies located outside the cutoff points in both of these plots were classified as potential outliers. The potential outliers were omitted as true outliers, if their study designs and materials were different from those of any other studies that were included in the final analysis. I2 was used to assess whether the data in the studies were heterogeneous (i.e., I2 ≥ 50%) or homogenous (i.e., I2 < 50%).[ An attempt was made to reach homogeneity from heterogeneous data by limiting the studies based on Africa, Asia, Europe, Latin America, the Middle East, North America, or Oceania, versus other regions; developing versus developed countries; QUADAS score ≥8 versus <8; and the presence versus absence of 1 of the 3 major sources of bias in a study included in diagnostic meta-analysis (investigation of heterogeneity sources). Diagnostic bivariate meta-analysis was performed to summarize sensitivity and specificity, positive and negative LRs, and DOR.[ Informational usability was categorized as exclusion and confirmation strategies, that is, positive LR > 10 and negative LR < 0.1; confirmation strategy only, that is, positive LR > 10 and negative LR > 0.1; exclusion strategy only, that is, positive LR < 10 and negative LR < 0.1; and no exclusion or confirmation strategy, that is, positive LR < 10 and negative LR > 0.1.[ HSROC curves were also constructed to provide AUC and summary points of sensitivity and specificity, 95% confidence regions and prediction regions.[ Prediction region is the region in which sensitivity and specificity of future studies will be plotted with a certain probability (e.g., 95%). The data were summarized separately in the same way as described for the investigation of heterogeneity sources to limit the studies (subgroup analysis). Subgroup analysis excluded “other regions” because some of these “other regions” may have been located far away from others included in the same “other regions.” Meta-regression was also performed to evaluate the statistical significance of differences in sensitivity and specificity between categories subjected to investigation of heterogeneity sources versus their counterparts.[ Publication bias was assessed using Deeks funnel plot asymmetry test.[ Cutoff points were proposed using the Youden index, that is, the point located on the HSROC curve that is the most distant from the straight line connecting the origin with the point at the upper right angle.[ As all of the data were extracted from the published literature, there was no requirement for ethical approval or informed consent.

Results

Systematic review

Twenty-seven articles were finally deemed to be eligible for the analysis (Fig. 1). However, 1 article evaluating cervical length, 1 article evaluating placental grade, and 1 article evaluating umbilical artery were excluded[ because the findings based on 1 data source were not generalizable. Four articles evaluating biparietal diameter were also excluded, because fetuses with shorter biparietal diameter than the lower limit of the predetermined range and with longer biparietal diameter than its upper limit were categorized as a group (positive result), and fetuses with biparietal diameter falling into this range were categorized as another group (negative result) in 3 of these 4 articles.[ As a result, 7 articles evaluating crown–rump length, 4 articles evaluating femur length, 4 articles evaluating the formulas of Campbell, Hadlock, and Shepard, and 5 articles evaluating uterine artery blood flow were finally included in this meta-analysis (Table 1 and Online Supplementary Results). The formulas of Campbell, Hadlock, and Shepard incorporated at least 1 measurement of biparietal diameter, head and abdominal circumferences, and femur length, as described in the Online Supplementary Results.

Figure 1

Meta-analysis flow diagram.

Table 1

Characteristics of studies included in the meta-analysis.

Meta-analysis flow diagram. Characteristics of studies included in the meta-analysis. Two or more studies were extracted from some of the included articles that used 2 or more cutoff points or formulas (Table 1). Therefore, 12 studies with 26,493 women evaluating crown–rump length; 5 studies with 8033 women evaluating femur length; 9 studies with 2675 women evaluating the formulas of Campbell, Hadlock, and Shepard; and 7 studies with 1400 women evaluating uterine artery blood flow were finally included in this meta-analysis (Table 2). Studies were conducted in 5 developing and 11 developed counties in Asia, Europe, Latin America, the Middle East, North America, and Oceania. The prevalence of low birthweight newborns (=2.0%–24.9%) varied depending on the study setting. On the other hand, longer black and gray bars versus shorter white bars in Fig. 2 indicated overall good quality of the studies included in the analysis. The 3 major sources of bias in a study included in diagnostic meta-analysis were relatively well controlled. That is, the same reference test regardless of the result of the index test, the cohort design, and prospective data collection were used in all of the studies, almost all of the studies (=91%), and almost half of the studies (=45%), respectively.

Table 2

Results of meta-analysis.

Figure 2

Results of study quality assessment according to the QUADAS. QUADAS = Quality Assessment of Diagnostic Accuracy Studies.

Results of meta-analysis. Results of study quality assessment according to the QUADAS. QUADAS = Quality Assessment of Diagnostic Accuracy Studies.

Outlier detection and investigation of heterogeneity sources

There were no true outliers in this meta-analysis. The data regarding crown–rump and femur lengths, and uterine artery blood flow were markedly heterogeneous (I2 = 97%–100%, Table 2). This heterogeneity may have been due to the cutoff points, which could be both values corresponding to birthweight of 2500 g and those not corresponding to birthweight of 2500 g (Table 1). The data regarding femur length were homogenous (I2 = 46%) by limiting to retrospective studies (Table S1) that exclusively used a cutoff point of the 5th percentile. The data regarding the formulas of Campbell, Hadlock, and Shepard were homogenous (I2 = 0%, Table 2). This homogeneity may have been due to the cutoff points, which could be only values corresponding to birthweight of 2500 g. The formulas of Campbell, Hadlock, and Shepard also had narrower 95% confidence and prediction regions around the summary point than crown–rump length, femur length, or uterine artery blood flow (Fig. 3). This was consistent with the relatively high proportion of heterogeneity, likely due to the threshold effect (=0.61–1.00).

Figure 3

HSROC curves, black circles represent observational studies, red circles represent summary points, black lines represent HSROC curves, red dashed dotted lines represent 95% confidence region, and blue-dotted lines represent 95% prediction region. HSROC = hierarchical summary receiver operating characteristic.

Meta-analysis and subgroup analysis

On primary screening, almost all low birthweight newborns should be categorized as positive results, but not all normal birthweight newborns have to be categorized as negative results. Therefore, sensitivity is more important than specificity. However, all of the methods examined in this meta-analysis showed low sensitivity, regardless of specificity (Table 2 and Fig. 3). Primary screening should also exclude normal birthweight newborns but does not have to confirm low birthweight newborns. Therefore, an exclusion strategy is more important than a confirmation strategy. However, the informational usability of crown–rump length, femur length, or uterine artery blood flow was categorized as no exclusion or confirmation strategy (Table 2). The informational usability of the formulas of Campbell, Hadlock, and Shepard was categorized as confirmation strategy only. This was the case for the results in all groups subjected to subgroup analysis (Table S1).

Meta-regression analysis

Meta-regression analysis showed a number of possible confounders as follows: Asia versus other regions had an effect on specificity of the formulas of Campbell, Hadlock, and Shepard (P = 0.00) and sensitivity of uterine artery blood flow (P = 0.00); Europe versus other regions had an effect on specificity of the formulas of Campbell, Hadlock, and Shepard (P = 0.00) and uterine artery blood flow (P = 0.04); North America versus other regions had an effect on sensitivity of uterine artery blood flow (P = 0.01); developing versus developed countries had an effect on specificity of the formulas of Campbell, Hadlock, and Shepard (P = 0.00); QUADAS score ≥8 versus <8 had an effect on the specificity of formulas of Campbell, Hadlock, and Shepard (P = 0.00); and cohort versus case–control study had an effect on specificity of uterine artery blood flow (P = 0.04). Within the limits of availability of P values, no other variables were shown to be confounders that affected sensitivity or specificity (P = 0.06–0.94 or 0.05–0.91, respectively).

Publication bias and proposed cutoff points

Deeks funnel plot asymmetry test showed publication bias among the data of crown–rump length (P = 0.04) but not among the data of femur length, the formulas of Campbell, Hadlock, and Shepard, or uterine artery blood flow (P = 0.18, 0.15, or 0.53, respectively) (Figure S1). The Youden index could not propose a cutoff point of crown–rump or femur length, or uterine artery blood flow (Fig. 3), because there were no studies using similar values of the cutoff point around the Youden index. The cutoff points of the formulas of Campbell, Hadlock, and Shepard were determined in advance to correspond to a birthweight of 2500 g.

Discussion

Main findings

Based on the literature search described in the Methods section, the present study is the first meta-analysis to evaluate the diagnostic performance of ultrasound used to detect low birthweight newborns. There is no evidence that ultrasound is suitable as a primary screening tool for detecting low birthweight newborns. This meta-analysis involved 38,601 participants in 16 counties in Asia, Europe, Latin America, the Middle East, North America, and Oceania by including 7, 4, 4, and 5 articles evaluating crown–rump and femur lengths; the formulas of Campbell, Hadlock, and Shepard; and uterine artery blood flow, respectively (Table 1 and Online Supplementary Results). Therefore, the findings in the total population (Table 2 and Fig. 3) are relatively generalizable (i.e., external validity). This meta-analysis also included good quality studies, as suggested by more “yes” and “unclear” responses versus fewer “no” responses to the QUADAS question items (Fig. 2). Prospective versus retrospective data collection used by only about half of the studies did not alter the interpretation of results (subgroup analysis and meta-regression analysis), and 2 other major sources of bias in a study included in diagnostic meta-analysis were almost always controlled. Therefore, the findings in the total population (Table 2 and Fig. 3) are not seriously affected by bias due to poor quality of included studies (i.e., internal validity).

Interpretation

Heterogeneity, possible confounders, or publication bias did not alter the interpretation of the results. Due to the high proportion of heterogeneity likely due to the threshold effect, as mentioned in the Results section, homogeneity (I2 = 0–38%) was achieved from all of the heterogeneous data in the total population by excluding the portions of heterogeneity likely due to threshold effects, supporting the justification in summarizing the data. Even by adjusting for almost all of the possible confounders, sensitivity (=0.23–0.62) was not sufficiently high and negative LRs (=0.39–1.00) were not sufficiently low for exclusion strategy, that is, negative LR < 0.1. The exception was sensitivity of uterine artery blood flow in Asia (=0.90), but this finding was not generalizable, because it was based on only 1 study with small sample size (=70). As statistical significance in differences of sensitivity or specificity depending on possible cofounders may have been due to a threshold effect (i.e., ecological fallacy), it is also unclear whether possible cofounders were truly confounders. Based on the slope of funnel plots (Figure S1), an increase in effective sample size of the data regarding crown–rump length, among which there is publication bias, contributes to a decrease of DOR, that is, lower levels of diagnostic performance. Secondary screening should categorize almost all normal birthweight newborns as negative results and confirm low birthweight newborns. The formulas of Campbell, Hadlock, and Shepard may be used in secondary screening. In addition to a high degree of accuracy (i.e., 0.9 ≤ AUC < 1.0),[ the formulas of Campbell, Hadlock, and Shepard showed high specificity (=0.96), the informational usability was categorized as confirmation strategy only (Table 2 and Fig. 3), and 95% confidence and prediction regions were very narrow (Fig. 3). Despite the differences in formulas, the homogeneity among the data regarding the formulas of Campbell, Hadlock, and Shepard may provide the rationale for jointly summarizing the data.

Strengths and weaknesses of the study

The first strength of the present study was the accordance between the meta-analysis and procedural guidance to conduct meta-analysis.[ The second strength was the use of bivariate meta-analysis to incorporate the negative relationship between sensitivity and specificity[ and Deeks funnel plot asymmetry test to limit the inflation of type I error.[ The third strength was the internal and external validity, based on the inclusion of 33 good quality studies with 38,601 participants extracted from 20 data sources (Tables 1 and 2 and Fig. 2). However, the present meta-analysis had some limitations, including extrapolation of the results to groups that were not subjected to subgroup or meta-regression analysis, for example, Africa versus other regions, males versus females, full-term versus preterm births, and intrauterine growth retardation versus all newborns except those with intrauterine growth retardation (Table S1). Non-English language studies were excluded, and only 1 person selected and reviewed the studies. Finally, it was impossible to clarify which of the possible confounders were true confounders.

Conclusions

In summary, the results of the present meta-analysis are clinically important for periconceptional strategies to reduce neonatal mortality and morbidity. There is no evidence that ultrasound is suitable for primary screening to detect low birthweight newborns.

17 in total

1. Adverse obstetric outcome in fetuses that are smaller than expected at second trimester routine ultrasound examination.

Authors: Jakob Nakling; Bjørn Backe
Journal: Acta Obstet Gynecol Scand Date: 2002-09 Impact factor: 3.636

2. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed.

Authors: Jonathan J Deeks; Petra Macaskill; Les Irwig
Journal: J Clin Epidemiol Date: 2005-09 Impact factor: 6.437

3. Prediction of low birthweight and small for gestational age from symphysis-fundal height mainly in developing countries: a meta-analysis.

Authors: Eita Goto
Journal: J Epidemiol Community Health Date: 2013-07-13 Impact factor: 3.710

4. Diagnostic value of maternal anthropometric measurements for predicting low birth weight in developing countries: a meta-analysis.

Authors: Eita Goto
Journal: Asia Pac J Clin Nutr Date: 2015 Impact factor: 1.662

Review 5. Measuring the accuracy of diagnostic systems.

Authors: J A Swets
Journal: Science Date: 1988-06-03 Impact factor: 47.728

6. A discrepancy between gestational age estimated by last menstrual period and biparietal diameter may indicate an increased risk of fetal death and adverse pregnancy outcome.

Authors: T Nguyen; T Larsen; G Engholm; H Møller
Journal: BJOG Date: 2000-09 Impact factor: 6.531

7. Fetal outcome when the ultrasound estimate of the day of delivery is more than 14 days later than the last menstrual period estimate.

Authors: K Tunón; S H Eik-Nes; P Grøttum
Journal: Ultrasound Obstet Gynecol Date: 1999-07 Impact factor: 7.299

8. Evidence of bias and variation in diagnostic accuracy studies.

Authors: Anne W S Rutjes; Johannes B Reitsma; Marcello Di Nisio; Nynke Smidt; Jeroen C van Rijn; Patrick M M Bossuyt
Journal: CMAJ Date: 2006-02-14 Impact factor: 8.262

9. Umbilical artery Doppler ultrasound predicts low birth weight and fetal death in hypertensive pregnancies.

Authors: P J Torres; E Gratacós; P L Alonso
Journal: Acta Obstet Gynecol Scand Date: 1995-05 Impact factor: 3.636

Review 10. Meta-analysis: identification of low birthweight by other anthropometric measurements at birth in developing countries.

Authors: Eita Goto
Journal: J Epidemiol Date: 2011-07-16 Impact factor: 3.211

1 in total

1. Comparison of logistic regression with machine learning methods for the prediction of fetal growth abnormalities: a retrospective cohort study.

Authors: Stefan Kuhle; Bryan Maguire; Hongqun Zhang; David Hamilton; Alexander C Allen; K S Joseph; Victoria M Allen
Journal: BMC Pregnancy Childbirth Date: 2018-08-15 Impact factor: 3.007

1 in total