Literature DB >> 28674846

Reporting and Analysis of Trial-Based Cost-Effectiveness Evaluations in Obstetrics and Gynaecology.

Mohamed El Alili1, Johanna M van Dongen2, Judith A F Huirne3, Maurits W van Tulder2, Judith E Bosmans2.   

Abstract

BACKGROUND AND OBJECTIVES: The aim was to systematically review whether the reporting and analysis of trial-based cost-effectiveness evaluations in the field of obstetrics and gynaecology comply with guidelines and recommendations, and whether this has improved over time. DATA SOURCES AND SELECTION CRITERIA: A literature search was performed in MEDLINE, the NHS Economic Evaluation Database (NHS EED) and the Health Technology Assessment (HTA) database to identify trial-based cost-effectiveness evaluations in obstetrics and gynaecology published between January 1, 2000 and May 16, 2017. Studies performed in middle- and low-income countries and studies related to prevention, midwifery, and reproduction were excluded. DATA COLLECTION AND ANALYSIS: Reporting quality was assessed using the Consolidated Health Economic Evaluation Reporting Standard (CHEERS) statement (a modified version with 21 items, as we focused on trial-based cost-effectiveness evaluations) and the statistical quality was assessed using a literature-based list of criteria (8 items). Exploratory regression analyses were performed to assess the association between reporting and statistical quality scores and publication year.
RESULTS: The electronic search resulted in 5482 potentially eligible studies. Forty-five studies fulfilled the inclusion criteria, 22 in obstetrics and 23 in gynaecology. Twenty-seven (60%) studies did not adhere to 50% (n = 10) or more of the reporting quality items and 32 studies (71%) did not meet 50% (n = 4) or more of the statistical quality items. As for the statistical quality, no study used the appropriate method to assess cost differences, no advanced methods were used to deal with missing data, and clustering of data was ignored in all studies. No significant improvements over time were found in reporting or statistical quality in gynaecology, whereas in obstetrics a significant improvement in reporting and statistical quality was found over time. LIMITATIONS: The focus of this review was on trial-based cost-effectiveness evaluations in obstetrics and gynaecology, so further research is needed to explore whether results from this review are generalizable to other medical disciplines. CONCLUSIONS AND IMPLICATIONS OF KEY
FINDINGS: The reporting and analysis of trial-based cost-effectiveness evaluations in gynaecology and obstetrics is generally poor. Since this can result in biased results, incorrect conclusions, and inappropriate healthcare decisions, there is an urgent need for improvement in the methods of cost-effectiveness evaluations in this field.

Entities:  

Mesh:

Year:  2017        PMID: 28674846      PMCID: PMC5606992          DOI: 10.1007/s40273-017-0531-3

Source DB:  PubMed          Journal:  Pharmacoeconomics        ISSN: 1170-7690            Impact factor:   4.981


Key Points for Decision Makers

Background

To inform decisions about the allocation of scarce healthcare resources, decision makers need information on the relative efficiency of alternative healthcare interventions, which can be provided by cost-effectiveness evaluations [1]. These cost-effectiveness evaluations are increasingly being conducted alongside controlled clinical trials (i.e. so-called trial-based cost-effectiveness evaluations) [2]. Failure to adequately conduct, analyse and/or report such cost-effectiveness evaluations can lead to biased conclusions, resulting in inappropriate healthcare decision making, and thus a possible waste of scarce resources. A growing number of cost-effectiveness evaluations in obstetrics and gynaecology are being conducted. To illustrate, a basic MEDLINE search combining search terms related to ‘obstetrics’ and ‘gynaecology’ and the MeSH term ‘cost-benefit analysis’ showed an increase in the number of published cost-effectiveness evaluations per year, from 32 in 2000 to 112 in 2015. A large share of these cost-effectiveness evaluations were conducted alongside a clinical trial. Interventions compared in these trials often concern induction of labour, hysterectomy (i.e. surgical removal of the uterus) and care arrangement (e.g. specialist nurse providing treatment vs physician providing treatment). Outcomes of these cost-effectiveness evaluations are usually expressed in clinical outcomes; for example, the number of caesarean sections or admission to intensive care. Costs associated with these interventions usually consist of materials used and occupation of caregiver or labour/operating room. Properly conducted cost-effectiveness evaluations in obstetrics and gynaecology can help to prevent wastage of scarce resources. This is important since obstetrics/gynaecology is a major contributor to total healthcare costs. For example, in a Dutch economic analysis comparing methods of induction, the costs of this specific obstetric procedure were estimated to be €1.4 million [3]. Reviews on the reporting and statistical methodology of trial-based cost-effectiveness evaluations show that major deficiencies are generally present in the way in which such evaluations are reported [4-7] and analysed [8-10]. This led Doshi et al. [8] to conclude that the results of trial-based cost-effectiveness evaluations need to be interpreted with caution due to the poor quality of the statistical approach. The majority of these reviews, however, only evaluated reporting quality [4-7] of trial-based cost-effectiveness evaluations and the only reviews that evaluated the statistical quality [8-10] were conducted over a decade ago. In the meantime, however, guidelines and recommendations [11-14] for trial-based cost-effectiveness evaluations have been updated and more researchers have been trained in the conduct of cost-effectiveness evaluations. In the field of obstetrics and gynaecology, methodological reviews showed similar characteristics (i.e. only evaluated reporting quality) [15, 16].

Objectives

This study aimed to explore whether the quality of reporting and the statistical methods of trial-based cost-effectiveness evaluations in obstetrics and gynaecology are in accordance with the most recent guidelines and recommendations, and whether both have improved over the past 16 years.

Methods

This systematic review, conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [17], included trial-based cost-effectiveness evaluations in the field of obstetrics and gynaecology that were published from January 1, 2000 up to May 16, 2017. A search was conducted in MEDLINE, the National Health Service Economic Evaluation Database (NHS EED), and the Health Technology Assessment (HTA) database. The development of the earliest guidelines took place in 1996 [18], therefore the year 2000 was used as the start date to allow for implementation of the guidelines.

Search Strategy

Databases were searched with terms related to the research field (e.g. ‘gynaecology’, ‘obstetrics’ or ‘pregnancy’) and study design (e.g. ‘cost-utility analysis’, ‘economic evaluation’, ‘cost effectiveness’ or ‘economic analysis’) in the title, abstract, and MeSH headings or keywords. The full PubMed search is available in Appendix S1 (see electronic supplementary material [ESM]). The electronic search was supplemented by searching reference lists of relevant review articles and of the retrieved full texts. During the search, a search log was kept consisting of keywords used, searched databases and search results. Titles and abstracts of the retrieved studies were stored in an electronic database using EndNote X7.4® (Thomson Reuters, New York, NY, US).

Study Selection

Two reviewers (ME and JMvD) independently screened titles and abstracts of identified studies for eligibility. Studies were included if they reported an economic evaluation alongside a controlled trial in obstetrics or gynaecology and concerned a cost-effectiveness analysis (CEA) and/or a cost-utility analysis (CUA). Cost-benefit analyses and cost-minimization analyses were excluded since healthcare decision makers are typically interested in CEAs and CUAs, and because statistical methods may differ across these kinds of economic evaluations [1]. Both randomized and non-randomized studies were included in the review. Papers had to be published as full papers and written in English. Furthermore, this systematic review focused on therapeutic procedures (e.g. surgical treatments, induction of labour, etc.) in obstetrics and gynaecology. Therefore, studies describing interventions related to prevention and screening as well as training of healthcare staff were excluded. Moreover, studies related to reproductive medicine (i.e. fertility) were also excluded. Finally, we specifically focused on high-income countries (e.g. countries in Europe and North America) as we expected cost-effectiveness evaluations from low-/middle-income countries to systematically be of lower quality and therefore result in significantly lower scores, whereas cost-effectiveness evaluations are mostly conducted in high-income countries (i.e. 83% of the total published cost-effectiveness evaluations) [19]. Methodological issues are typically present in cost-effectiveness evaluations from low-/middle-income countries, such as scarcity and quality of the data used, trials that do not prioritize economics and absence of cost accounting systems [20], which makes it difficult to compare evaluations between high-income and low-income countries. Full texts were retrieved when studies fulfilled the inclusion criteria or if uncertainty remained about the inclusion of a specific study. All full texts were read and checked for eligibility by two independent reviewers (ME and JMvD). To resolve disagreement between the two reviewers, a consensus procedure was used. A third reviewer (JEB) was consulted when disagreements persisted.

Data Extraction

Two reviewers (ME and JMvD) independently extracted data from the included studies using a standardized extraction form. Agreement between the reviewers was checked during a face-to-face meeting, and a consensus procedure was used involving a third reviewer (JEB) if necessary. The first part of the extraction form focused on general study characteristics (e.g. year of publication, country), healthcare delivery (i.e. primary or secondary care), medical discipline (i.e. obstetrics or gynaecology), and the design of the trial (i.e. non-randomized study [NRS] or randomized controlled trial [RCT]). The second part focused on cost-effectiveness evaluation design aspects: type of evaluation (i.e. CEA or CUA), study perspective (e.g. healthcare perspective, societal perspective), study population, follow-up period, comparator and outcome measures. The third part focused on the statistical approach of the trial-based cost-effectiveness evaluation and is described in Sect. 2.5.

Reporting Quality of Trial-Based Cost-Effectiveness Evaluations

Reporting quality was assessed using the Consolidated Health Economic Evaluation Reporting Standard (CHEERS) statement [11] that provides concrete recommendations to optimize the reporting of cost-effectiveness evaluations. Recommendations are subdivided into six main categories: (1) title and abstract, (2) introduction, (3) methods, (4) results, (5) discussion and (6) other. For a detailed description of the CHEERS statement, the reader is referred to Husereau et al. [11]. The full CHEERS statement is provided in Appendix S2 (see ESM). As the focus of this study was to evaluate trial-based cost-effectiveness evaluations, modelling-related criteria in the statement were omitted (i.e. items 15, 16 and 18). This resulted in a modified CHEERS statement with 21 items that were answered by ‘yes/no’. Studies fulfilling the criteria mentioned in the items were scored ‘yes’ and assigned a score of 1 per correct item (‘no’ was scored as 0). Answers were compared between the two reviewers and disagreements were discussed until consensus was reached. An overall reporting quality score ranging from 0 to 21 was calculated by adding up the number of items that were scored ‘yes’.

Quality of the Statistical Approach of Trial-Based Cost-Effectiveness Evaluations

To evaluate the quality of the statistical approach, four quality domains were identified based on existing guidelines [12-14]. These domains, including their subdomains, are described below. Analysis of incremental costs: This domain consisted of three sub-domains. First, we assessed whether the cost difference was presented (‘yes/no’). Studies presenting cost differences were scored as handling this sub-domain appropriately (score = 1); all others as inappropriate (score = 0). Second, we assessed the method for estimating the statistical uncertainty surrounding the cost difference, while accounting for the skewed distribution of cost data. Studies using non-parametric bootstrapping or a gamma distribution in combination with multivariable regression methods were scored as handling this sub-domain appropriately (score = 1); all others as inappropriate (score = 0) [14, 21–23]. Third, trial-based cost-effectiveness evaluations are typically underpowered for economic outcomes [24]. Consequently, researchers are recommended to use estimation (i.e. confidence intervals) rather than hypothesis testing (i.e. p values) [25]. Therefore, studies presenting confidence intervals were scored as handling this sub-domain appropriately (score = 1); all others as inappropriate (score = 0). An overall domain score was calculated by adding up the studies’ scores per sub-domain (1 point per correct sub-domain, maximum score = 3). Analysis of cost-effectiveness: This category consisted of three sub-domains. First, we assessed whether the authors presented an incremental cost-effectiveness ratio (ICER) (‘yes/no’). Studies presenting an ICER were scored as handling this sub-domain appropriately (score = 1); all others as inappropriate (score = 0). Second, the method for dealing with sampling uncertainty surrounding the ICER was assessed. Non-parametric bootstrapping is considered the most appropriate method and is recommended by current guidelines [12-14]. Therefore, studies using non-parametric bootstrapping were scored as handling this sub-domain appropriately (score = 1); all others as inappropriate (score = 0). Third, we assessed whether the presentation of the uncertainty surrounding the ICER was adequate. Bootstrapped cost and effect data can be plotted in a cost-effectiveness plane (CE plane), which graphically presents the uncertainty surrounding the ICER [26]. Furthermore, the joint uncertainty surrounding costs and effects can be presented in a cost-effectiveness acceptability curve (CEAC) [27]. Presentation of 95% confidence intervals around ICERs is not considered appropriate due to interpretation issues when statistical uncertainty surrounding the ICER is distributed across more than one quadrant in the CE plane [28]. Studies presenting a CE plane and a CEAC without 95% confidence intervals around ICERs were scored as handling this sub-domain appropriately (score = 1); all others as inappropriate (score = 0). An overall domain score was calculated by adding up the studies’ scores per sub-domain (1 point per correct sub-domain, maximum score = 3). Handling of missing data: Multiple imputation (MI) is currently considered the most appropriate method for dealing with missing cost data [13, 14], while maximum likelihood approaches (e.g. expectation-maximization algorithm) are also considered to result in valid estimates [13, 29]. However, this only applies when the missingness of data has a relationship with observed factors among participants, but not with unobserved factors. This is often referred to as the Missing At Random (MAR) assumption [25, 30, 31]. Therefore, studies using one of these approaches were classified as handling this domain appropriately (score = 1); all others as inappropriate (score = 0). Furthermore, studies with only a small amount of missing data (i.e. in our review we used a threshold of ≤5%) that used a complete-case analysis were also classified as handling this domain appropriately (score = 1). When >5%, but <10% of data is missing, more simple imputation techniques might be preferred over MI, purely for practical reasons [32]. Addressing uncertainty (sensitivity analysis): Three types of uncertainty are inherent to trial-based cost-effectiveness evaluations: parameter uncertainty (i.e. uncertainty due to variables that might influence results, such as unit costs), methodological uncertainty (i.e. uncertainty due to the use of different methods for analysis) and subgroup uncertainty (i.e. uncertainty due to possible differences across subgroups of participants) [33, 34]. To assess the impact of these types of uncertainty on the robustness of the results, sensitivity analyses should be undertaken [25]. Studies performing at least one of the three types of sensitivity analyses were classified as handling this domain appropriately (score = 1); all others as inappropriate (score = 0). An overall quality score of the statistical approach, ranging from 0 to 8, was calculated per study by adding up the number of overall sub-domains that were scored ‘yes’. See Table 1 for a summary of appropriate methods per domain.
Table 1

Summary of appropriate methods per domain

DomainSubdomainAppropriate methoda
Analysis of incremental costsPresenting cost differencesPresented cost differences
Estimating statistical uncertainty around cost differencesNon-parametric bootstrapping or gamma distribution combined with multivariable regression methods
Presentation of uncertainty around cost differencesPresented confidence intervals
Analysis of cost effectivenessPresenting ICERPresented ICER
Dealing with sampling uncertaintyNon-parametric bootstrapping
Presentation of uncertainty around ICERPresented CE plane and CEAC without confidence intervals around ICER
Handling of missing dataMultiple imputation and EM algorithm
Addressing uncertaintyParameter uncertaintyMethodological uncertaintySubgroup analysisAt least one of these sensitivity analyses performed

CE plane cost-effectiveness plane, CEAC cost-effectiveness acceptability curve, EM expectation-maximization, ICER incremental cost-effectiveness ratio

aIf the appropriate method was used, a score of 1 was rewarded. All other methods resulted in a score of 0

Summary of appropriate methods per domain CE plane cost-effectiveness plane, CEAC cost-effectiveness acceptability curve, EM expectation-maximization, ICER incremental cost-effectiveness ratio aIf the appropriate method was used, a score of 1 was rewarded. All other methods resulted in a score of 0

Statistical Analysis

To describe the included studies’ reporting and statistical quality, descriptive statistics were used. To explore whether quality improved over time, linear regression analyses were performed; one with the overall reporting quality score as dependent variable and one with the overall quality score of the statistical approach as dependent variable stratified for medical discipline (i.e. obstetrics and gynaecology). The year of publication was used as an independent variable resulting in the regression model described below. Analyses were conducted using STATA 14®.

Results

Literature Search and Study Selection

The electronic search identified 5482 potentially eligible studies. After removing 246 duplicates, 5236 studies were screened on title and abstract. The reviewers disagreed on the inclusion of 112 (2%) studies, resulting in an inter-rater agreement of 98%. Seventy-one studies were retrieved for full-text screening. In four cases, consensus was reached by asking a third reviewer. After the full-text screening, 44 studies [35-78] were included. One study [79] was identified through reference checking and was also included in the review (Fig. 1). This resulted in 45 studies included for review.
Fig. 1

Flow chart for inclusion of studies. CEA cost-effectiveness analysis, CUA cost-utility analysis, HTA Health Technology Assessment database, NHSEED NHS Economic Evaluation Database

Flow chart for inclusion of studies. CEA cost-effectiveness analysis, CUA cost-utility analysis, HTA Health Technology Assessment database, NHSEED NHS Economic Evaluation Database

Study Characteristics

Study characteristics are reported in Table 2. Just over half of the studies were conducted in gynaecology (56%; n = 23). Most studies conducted a CEA (87%; n = 39), and five (11%) studies [46, 49, 52, 69, 77] conducted a CUA. One (2%) study [79] conducted both a CEA and a CUA. The hospital perspective was used in 28 (62%) studies [35–38, 40–45, 47–50, 53, 55, 59, 60, 62, 65–68, 71, 73–75, 78], followed by the healthcare perspective (22%; n = 10) [39, 46, 51, 52, 54, 58, 63, 69, 70, 77] and the societal perspective (12%; n = 5) [56, 57, 64, 76, 79]. In two (4%) studies [61, 72], the perspective was unclear. Twenty-eight (62%) studies [35, 38–41, 48, 50, 52, 54, 55, 57, 58, 61–66, 69–77, 79] were conducted alongside an RCT and 17 (41%) [36, 37, 42–47, 49, 51, 53, 56, 59, 60, 67, 68, 78] alongside an NRS. Sample sizes ranged from 35 [55] to 9996 participants [71] and the duration of follow-up ranged from 24 hours [66] to 36 months [47]. The majority of studies were conducted in Europe (66%; n = 27) [35, 37, 39–41, 43, 47, 50–52, 57–59, 61–65, 67–70, 74–76, 78, 79] and North America (29%; n = 12) [36, 38, 42, 44–46, 48, 49, 53, 56, 60, 66]. Two (4%) studies [54, 71] were conducted over multiple countries and one (2%) study [55] did not report the country where the study was conducted, but the authors’ affiliation was from the Republic of Ireland.
Table 2

Study characteristics

ReferencesPublication yearData collectionGeographical areaHealthcare deliveryMedical disciplineType of EEPerspectiveStudy designSample size (n)PopulationFollow-upComparison betweenOutcome measures
Bernitz et al. [35]20122006–2010NorwaySecondary careObstetricsCEAHospitalRCT1110Women assessed to be at low risk at spontaneous onset of labourFrom the women’s admission to the hospital at onset of spontaneous labour until dischargeMidwife-led birth unit vs standard obstetric unitProportions of caesarean sections, instrumental vaginal deliveries, complications requiring treatment in the operating room, epidural analgesia and augmentation with oxytocin
Bienstock et al. [36]20011994–1996USASecondary careObstetricsCEAHospital inferred (not reported)NRS260Patients with a history of preterm labourNot reportedInner-city hospital house staff vs inner city managed care organizationPrimary outcomes: rate of recurrent preterm deliverySecondary outcomes: rate of NICU admission, NICU length of stay and perinatal mortality
Brooten et al. [38]20011992–1996USASecondary careObstetricsCEAHospitalRCT173Women with high-risk pregnancies12 monthsSpecialist nurse care at home vs standard prenatal carePrimary outcome: maternal effects and infant effectsSecondary outcome: patient satisfaction
Eddama et al. [41]20092005–2006UKSecondary careObstetricsCEAHospitalRCT350Nulliparous women with a singleton pregnancy, cephalic presentation >37 weeks’ gestation, requiring cervical ripening prior to induction of labourFrom randomization until hospital dischargeIsosorbide mononitrate vs placeboElapsed time interval from hospital admission to delivery
Eddama et al. [40]20102004–2008UKSecondary careObstetricsCEAHospitalRCT500Women before 20 weeks’ gestation with a twin pregnancyFrom randomization until hospital dischargeVaginal progesterone gel vs placeboNumber of preterm births prevented
Guo et al. [48]20112001–2004CanadaSecondary careObstetricsCEAHospitalRCT153Women with clinical preterm labourNot reportedTransdermal nitro-glycerine vs placeboPrimary outcome: NICU admissionSecondary outcomes: gestational age at delivery, length of NICU stay
Jakovljevic et al. [51]20082004–2006Serbia and MontenegroSecondary careObstetricsCEAHealthcare (Republic Institute for Health Insurance in Serbia)NRS235Pregnant women with threatened preterm labourFrom emergence of uterine contractions to deliveryFenoterol vs ritodrine for treatment of preterm labourPrimary outcomes: length of pregnancy, prolongation of the pregnancy, and score on modified Flanagan’s quality-of-life scale for chronic diseasesSecondary outcomes: quality-adjusted pregnancy weeks gained, adverse drug reactions and pregnancy outcome (neonatal health)
Lain et al. [54]20172004–201311 countriesSecondary careObstetricsCEAHealthcareRCT1892Women with a singleton pregnancy with ruptured membranes between 34 and 36 weeks’ gestationNot reportedPlanned immediate birth vs delayed birthPrimary outcome: neonatal sepsisSecondary outcome: respiratory distress syndrome
Liem et al. [57]20142009–2012NetherlandsSecondary careObstetricsCEASocietalRCT813Women with a multiple pregnancy6 weeksCervical pessary vs standard care (no pessary)Poor perinatal and health outcomes
Morrison et al. [60]20032001USASecondary careObstetricsCEAHospital inferred (not reported)NRS60Women with recurrent preterm labour at <32 weeks’ gestationNot reportedContinuous subcutaneous terbutaline vs standard careAmount of terbutaline infused and associated side effects, the gestational age at delivery, and reason for birth as well as pregnancy prolongation after discharge from the sentinel recurrent preterm labour event. Maternal hospital days, route of delivery and neonatal parameters
Niinimaki et al. [61]20092003–2004FinlandSecondary careObstetricsCEAUnclearRCT98Women with a diagnosed miscarriage2 monthsMedical treatment for miscarriage vs surgical treatment for miscarriageSuccess rate/uncomplicated treatment
Petrou et al. [63]20112005–2006UKSecondary careObstetricsCEAHealthcare (NHS)RCT165Pregnant women presenting as cephalic between 36 and 41 weeks’ gestation, for whom induction of labour was deemed necessaryFrom randomization until hospital dischargeProstaglandin gel vs prostaglandin tabletsTime prevented between induction and delivery
Petrou et al. [64]20061997–2001UKSecondary careObstetricsCEASocietalRCT1200Women with a confirmed pregnancy of <13 weeks’ gestation with a diagnosis of incomplete miscarriage or missed miscarriage8 weeksExpectant management vs medical or surgical managementGynaecological infection avoided
Prick et al. [65]20142004–2011NetherlandsSecondary careObstetricsCEAHospitalRCT519Women with acute anaemia after postpartum haemorrhage6 weeksRed blood cell transfusion vs non-interventionPrimary outcome: physical fatigueSecondary outcomes: remaining health-related quality of life scores, transfusion reactions and physical complications until 6 weeks postpartum
Ramsey et al. [66]20031996–1997USASecondary careObstetricsCEAHospitalRCT111Women with an unfavourable cervix who underwent labour induction24 hoursMisoprostol vs dinoprostone gel or dinoprostone insertComplete dilatation within the first 24 hours of treatment
Simon et al. [71]20061998–200133 low-, middle- and high-income countriesSecondary careObstetricsCEAHospitalRCT9996Women with pre-eclampsiaFrom randomization until 6 weeks, discharge from hospital after delivery or deathMagnesium sulphate vs placeboThe number of cases of eclampsia prevented or death
Sjostrom et al. [72]20162011–2012SwedenSecondary careObstetricsCEAUnclearRCT1068Healthy women seeking treatment for abortion3 weeksMedical abortion by physician vs medical abortion by nurse-midwifeComplete abortion without need for surgical intervention
Ten Eikelder et al. [73]20172012–2013NetherlandsSecondary careObstetricsCEAHospitalRCT1845Women with a viable term singleton pregnancy in cephalic presentation, intact membranes, and unfavourable cervix without previous caesarean sectionNot reportedLabour induction with oral misoprostol vs labour induction with Foley catheterComposite safety outcome and caesarean section
Van Baaren et al. [75]20132009–2010NetherlandsSecondary careObstetricsCEAHospitalRCT819Pregnant women at term with an unfavourable cervix6 weeksInduction of labour with Foley catheter vs induction of labour with prostaglandin E2 gelCaesarean section rate (yes/no)
Van Baaren et al. [74]20162009–2013NetherlandsSecondary careObstetricsCEAHospitalRCT703Women with hypertensive disorder between 34 and 37 weeks’ gestationFrom randomization to hospital dischargeImmediate delivery vs expectant monitoringComposite score of adverse maternal outcomes
Vijgen et al. [76]20102005–2008NetherlandsSecondary careObstetricsCEASocietalRCT756Women diagnosed with gestational hypertension or pre-eclampsia between 36 and 41 weeks’ gestation12 monthsInduction of labour vs expectant monitoringDifference in proportion of maternal complications
Walker et al. [77]20172013–UKSecondary careObstetricsCUAHealthcare (NHS)RCT241Nulliparous women aged ≥35 years on their expected due date, with a singleton live fetus in a cephalic presentation1 monthInduction of labour vs expectant monitoringQALY
Bijen et al. [79]2011UnclearNetherlandsSecondary careGynaecologyCEA/CUASocietalRCT279Patients with early-stage endometrial cancer3 monthsTotal laparoscopic hysterectomy vs TAHPrimary outcome CEA: major complication-free ratePrimary outcome CUA: QALY
Bogliolo et al. [37]20162011–2014ItalySecondary careGynaecologyCEAHospital inferred (not reported)NRS104Women who underwent robotically assisted hysterectomy and bilateral salpingo-oophorectomy12 months for effects and 6 months for costsRobotic single-site hysterectomy vs multiport robotic hysterectomyPostoperative pain, intraoperative complications, and postoperative complications
Dawes et al. [39]20072003–2004UKSecondary careGynaecologyCEAHealthcare (NHS)RCT111Women scheduled for major abdominal or pelvic surgery for benign gynaecological disease6 weeksSpecialist nurse care vs standard carePrimary outcome: SF-36 health survey questionnaireSecondary outcomes: complications, length of hospital stay, readmission, information on discharge, support and satisfaction of women
El Hachem et al. [42] 20162013–2014USASecondary careGynaecologyCEAHospitalNRS92Women undergoing RSS or CLNot reportedRSS vs CLOperative time and various perioperative outcomes
El-Sayed et al. [43]20112009–2010UKSecondary careGynaecologyCEAHospital inferred (not reported)NRS140Women with acute gynaecology conditionsNot reportedUltrasound-based model of care vs traditional model of careHospital length of stay
Eltabbakh et al. [44]20001998–1999USASecondary careGynaecologyCEAHospital inferred (not reported)NRS80Obese women with early-stage endometrial carcinoma24 monthsLaparoscopic-assisted VH vs total abdominal hysterectomySurgical outcome, hospital stay, recall of postoperative pain control, time to return to full activity and to work, and overall satisfaction among patients
Eltabbakh et al. [45]20011998–1999USASecondary careGynaecologyCEAHospital inferred (not reported)NRS147Women with early-stage endometrial carcinoma24 monthsLaparoscopic-assisted VH vs total abdominal hysterectomySurgical outcome, hospital stay, recall of postoperative pain control, time to return to full activity and to work, and overall satisfaction among patients
Evans [46]2000UnclearUSASecondary careGynaecologyCUAHealthcare (Medicare)NRS100Patients with dysfunctional uterine bleeding12 monthsSonohysterography vs hysteroscopic evaluationUtility value
Fernandez et al. [47]20031995–1997FranceSecondary careGynaecologyCEAHospital inferred (not reported)NRS147Patients who had undergone one of the three surgical interventions for menorrhagia24–36 monthsThermo-coagulation vs VH or endometrial ablationPrimary outcome: failure rate of the method for menorrhagiaSecondary outcomes: satisfaction with the procedure and ongoing pain
Horowitz et al. [49]20021997–1998USASecondary careGynaecologyCUAHospital inferred (not reported)NRSNot reportedWomen undergoing gynaecological and surgical proceduresNot reportedPre-operative autologous blood donation vs no blood donationQALY
Jack et al. [50]20052001–2002UKSecondary careGynaecologyCEAHospitalRCT197Women complaining of excessive menstrual loss12 monthsOutpatient microwave endometrial ablation vs standard microwave endometrial ablationPrimary outcomes: satisfaction with treatment and acceptability of treatmentSecondary outcomes: menstrual outcomes and quality of life
Kilonzo et al. [52]20102003–2005UKSecondary careGynaecologyCUAHealthcare (NHS)RCT314Women complaining of heavy menstrual bleeding12 monthsMicrowave endometrial ablation vs thermal balloon endometrial ablationQALY
Kovac [53]20001988–1993USASecondary careGynaecologyCEAHospital inferred (not reported)NRS4595Women undergoing hysterectomyNot reportedDecision-directed hysterectomy vs nondecision-directed hysterectomyPrimary outcome: length of staySecondary outcome: complications
Lalchandani et al. [55]20051999–2001Not reported (Ireland and UK in authors’ affiliation)Secondary careGynaecologyCEAHospitalRCT35Women with minimal to moderate endometriosis12 monthsHelium thermal coagulator therapy vs medical therapy using gonadotropin-releasing hormone analoguesMean operating time
Lenihan et al. [56]20042001–2003USASecondary careGynaecologyCEASocietal inferred (not reported)NRS268Patients that have undergone a hysterectomyNot reportedLaparoscopic-assisted VH vs TAH or total VHIncidence of complications, time to normal activity and return to work
Lumsden et al. [58]2000UnclearUKSecondary careGynaecologyCEAHealthcare (NHS)RCT200Women scheduled for an abdominal hysterectomy for benign gynaecological disease12 monthsLaparoscopic-assisted hysterectomy vs abdominal hysterectomyConversion rate laparoscopic-assisted VH to TAH, complication rate and quality of life
Marino et al. [59]20152007–2010FranceSecondary careGynaecologyCEAHospitalNRS306Women referred for gynaecologic oncologic indications24 monthsRobotic-assisted laparoscopy vs standard laparoscopySurgical outcomes
Palomba et al. [62]20062001–2003ItalySecondary careGynaecologyCEAHospital inferred (not reported)RCT80Postmenstrual women with severe midline pelvic pain persisting for >6 months and unresponsive to common medical treatment12 monthsLaparoscopic uterine nerve ablation vs vaginal uterosacral ligament resectionCure rate, severity of CPP and deep dyspareunia
Relph et al. [67]20142010–2012UKSecondary careGynaecologyCEAHospitalNRS90Women undergoing VHNot reportedERAS vs standard care (before ERAS)Length of inpatient stay
Sarlos et al. [68]20102007–2009SwitzerlandSecondary careGynaecologyCEAHospitalNRS80Women needing a hysterectomyNot reportedRobotic hysterectomyLaparoscopic hysterectomy
Sculpher et al. [69]20041999–2000UKSecondary careGynaecologyCUAHealthcare (NHS)RCT487/571a Women requiring a hysterectomy for reasons other than malignancy52 weeksLaparoscopic hysterectomy vs VH or abdominal hysterectomyQALY
Sculpher et al. [70]20001992–1994UKSecondary careGynaecologyCEAHealthcareRCT160Pre-menopausal women with dysfunctional uterine bleedingFrom randomization to 2 years after interventionGoserelin vs danazolDifferential rate of amenorrhoea
Yoong et al. [78]20162009–2014UKSecondary careGynaecologyCEAHospitalNRS50Women undergoing primary vaginal or laparoscopic ovarian cystectomy for benign ovarian cystsNot reportedPrimary vaginal ovarian cystectomy vs laparoscopic approachPatient-related outcomes

CEA cost-effectiveness analysis. CL conventional laparoscopic surgery, CPP chronic pelvic pain, CUA cost-utility analysis, ERAS enhanced recovery after surgery programme, NICU neonatal intensive care unit, NRS non-randomized study, QALY quality-adjusted life-years, RCT randomized controlled trial, RSS robotic-single-site surgery, TAH total abdominal hysterectomy, VH vaginal hysterectomy

aTwo parallel RCTs

Study characteristics CEA cost-effectiveness analysis. CL conventional laparoscopic surgery, CPP chronic pelvic pain, CUA cost-utility analysis, ERAS enhanced recovery after surgery programme, NICU neonatal intensive care unit, NRS non-randomized study, QALY quality-adjusted life-years, RCT randomized controlled trial, RSS robotic-single-site surgery, TAH total abdominal hysterectomy, VH vaginal hysterectomy aTwo parallel RCTs

Reporting Quality of the Trial-Based Cost-Effectiveness Evaluations

Results of the reporting quality assessment are presented in Table 3. The overall reporting quality score (with a maximum of 21) ranged from 1 to 17 (mean 8.8; SD 4.8; median 8). Twenty-seven (60%) studies [35–39, 42–47, 49–51, 53, 55, 56, 58–62, 66–68, 72, 78] did not adhere to ≥50% of the items (i.e. having a score ≤10) of the CHEERS statement; one (2%) study [76] had a score of 17 (81% of the items were scored positively). Criteria that were often adequately described in the studies were the title (n = 40; 89%), the target population (n = 30; 67%) and the comparators (n = 33; 73%). Criteria that were least appropriately described were the abstract (n = 4; 9%), setting and location (n = 4; 9%) and choice of health outcomes (n = 6; 13%).
Table 3

Reporting quality score using the CHEERS checklist

ReferencesTitleAbstractBackground and objectivesTarget population and subgroupsSetting and locationStudy perspectiveComparatorsTime horizonDiscount rateChoice of health outcomesMeasurement of effectiveness
Bernitz et al. [35]YesNoYesNoNoYesYesNoNoNoYes
Bienstock et al. [36]NoNoYesYesNoNoNoNoNoNoNo
Brooten et al. [38]YesNoYesYesNoNoYesYesNoNoYes
Eddama et al. [41]YesNoNoYesNoYesYesNoYesNoYes
Eddama et al. [40]YesNoNoYesNoYesYesNoYesNoYes
Guo et al. [48]YesyesNoYesNoYesYesNoNoNoYes
Jakovljevic et al. [51]YesNoNoYesYesYesYesNoNoNoNo
Lain et al. [54]YesYesYesNoNoYesYesNoYesNoYes
Liem et al. [57]YesNoYesYesNoYesYesNoYesNoYes
Morrison et al. [60]NoNoYesYesNoNoYesNoNoNoNo
Niinimaki et al. [61]YesNoNoYesNoNoYesNoYesNoYes
Petrou et al. [63]YesYesNoYesNoYesYesNoYesNoYes
Petrou et al. [64]YesNoNoYesNoYesYesNoYesNoYes
Prick et al. [65]YesNoYesYesYesNoYesNoNoNoYes
Ramsey et al. [66]YesNoNoYesNoNoYesNoNoNoYes
Simon et al. [71]YesNoYesYesNoYesYesNoYesNoYes
Sjostrom et al. [72]YesNoNoNoNoNoYesNoYesNoYes
Ten Eikelder et al. [73]YesNoNoNoNoYesYesNoYesNoYes
Van Baaren et al. [75]YesNoNoYesYesYesYesNoYesNoYes
Van Baaren et al. [74]YesNoYesYesNoYesYesNoYesNoYes
Vijgen et al. [76]YesNoYesYesNoYesYesYesYesNoYes
Walker et al. [77]YesNoNoNoNoYesYesNoYesYesYes
Bijen et al. [79]YesNoNoYesNoYesNoNoNoYesYes
Bogliolo et al. [37]YesNoNoYesNoNoYesNoNoNoNo
Dawes et al. [39]YesNoYesYesYesYesYesNoNoNoYes
El Hachem et al. [42] YesNoNoYesNoNoYesNoNoNoNo
El-Sayed et al. [43]YesNoNoNoNoNoNoNoNoNoNo
Eltabbakh et al. [44]NoNoYesYesNoNoYesNoNoNoNo
Eltabbakh et al. [45]NoNoYesNoNoNoYesNoNoNoNo
Evans [46]YesNoYesNoNoYesNoYesYesYesNo
Fernandez et al. [47]YesNoYesYesNoNoNoYesNoNoNo
Horowitz et al. [49]YesNoNoNoNoNoNoNoNoYesNo
Jack et al. [50]YesNoNoYesNoNoYesYesNoNoYes
Kilonzo et al. [52]YesNoNoYesNoYesYesYesYesYesYes
Kovac [53]YesNoNoNoNoNoNoNoNoNoNo
Lalchandani et al. [55]YesNoNoNoNoNoYesNoNoNoYes
Lenihan et al. [56]YesNoNoNoNoNoNoNoNoNoNo
Lumsden et al. [58]YesNoNoYesNoYesNoYesNoNoYes
Marino et al. [59]YesNoNoNoNoYesNoNoNoNoNo
Palomba et al. [62]NoNoNoYesNoNoYesYesNoNoYes
Relph et al. [67]YesNoNoNoNoNoNoNoNoNoNo
Sarlos et al. [68]YesNoNoNoNoNoYesNoNoNoNo
Sculpher et al. [69]YesNoNoYesNoYesYesYesYesYesYes
Sculpher et al. [70]YesYesYesYesNoYesYesNoYesNoYes
Yoong et al. [78]YesNoNoYesNoNoNoNoNoNoNo
Studies complying with reporting criteria (%)89936679517320401362

Compliance with reporting criteria: italic values: ≥75% of reporting criteria correct; bold values: 51–74% of reporting criteria correct; underlined values: 26–50% of reporting criteria correct, bold italic values ≤25% of reporting criteria correct

CHEERS Consolidated Health Economic Evaluation Reporting Standard, NA not available

Reporting quality score using the CHEERS checklist Compliance with reporting criteria: italic values: ≥75% of reporting criteria correct; bold values: 51–74% of reporting criteria correct; underlined values: 26–50% of reporting criteria correct, bold italic values ≤25% of reporting criteria correct CHEERS Consolidated Health Economic Evaluation Reporting Standard, NA not available Results of the quality assessment of the statistical approach are presented in Table 4. The overall quality score of the statistical approach per study ranged from 0 to 6 (see Table 4 and Appendix S3 in ESM for scores per sub-domain). Six (15%) studies [36, 37, 46, 56, 60, 78] did not use any of the recommended methods (i.e. overall quality score = 0). Furthermore, 32 (71%) studies [35–40, 42–51, 53, 55, 56, 58–62, 65–68, 70, 72, 76, 78] did not adhere to ≥ 50% of the statistical quality items (i.e. having a score ≤4). None of the studies (see appendix S3, ESM) used the recommended statistical method to assess the cost differences between interventions. Furthermore, no study used more advanced methods for handling missing data (i.e. multiple imputation or maximum likelihood approaches). When there was <10% missing data, more simple techniques were used in 16 (36%) studies [39, 45, 48, 49, 54, 55, 57–59, 62, 63, 66, 68, 73, 75]. Of note, no study looked into the clustered nature of the data by using methods that correct for clustering.
Table 4

Statistical approach of included studies

ReferencesAnalysis of incremental costsAnalysis of cost effectivenessHandling missing dataDealing with uncertaintyOverall quality score of statistical approach
Cost difference presentedStatistical assessment of cost differencesPresentationICERMethod sampling uncertaintyPresentation sampling uncertaintyParameter uncertaintyMethodological uncertaintySubgroup analysis
Bernitz et al. [35]No T test p valueYesNot reported, non-parametric bootstrap (1000 replications) in the sensitivity analysisCE planeNot reportedNoYes, non-parametric bootstrap (1000 replications) in the sensitivity analysisNo 2
Bienstock et al. [36]No T test p valueNoNot reportedNo presentationNot reportedNoNoNo 0
Brooten et al. [38]Yes T test p valueNoNot reportedNo presentationNot reportedNoNoYes 2
Eddama et al. [41]Yes T test with bootstrap (1000 replications)95% CI and p valueYesNon-parametric bootstrap (1000 replications)CE plane and CEACNot reportedYesNoNo 6
Eddama et al. [40]Yes T test with bootstrap (1000 replications)95% CI and p valueNoNon-parametric bootstrap (1000 replications)CE planeNot reportedYesNoNo 4
Guo et al. [48]YesNot reportedNo presentationNoNot reportedCE planeComplete-case analysis <5% missing dataYesNoNo 1
Jakovljevic et al. [51]No T test p valueYes T test p valueComplete-case analysis >5% missing dataYesNoNo 2
Lain et al. [54]Yes T test with bootstrap (5000 replications)95% CINoNon-parametric bootstrap (5000 replications)CE planeComplete-case analysis <5% missing dataYesYesYes
Liem et al. [57]YesMann–Whitney test95% CIYesNon-parametric bootstrap (1000 replications)CE plane and CEACComplete-case analysis <5% missing dataYesNoYes 5
Morrison et al. [60]No T test p valueNoNot reportedNo presentationNot reportedNoNoNo 0
Niinimaki et al. [61]YesNot reportedNo presentationYesNot reportedNo presentationNot reportedNoNoNo 2
Petrou et al. [63]Yes T test with bootstrap (1000 replications)95% CI and p valueYesNon-parametric bootstrap (1000 replications)CE plane and CEACComplete-case analysis <5% missing dataYesNoNo 7
Petrou et al. [64]Yes T test with bootstrap (1000 replications)95% CI and p valueYesNon-parametric bootstrap (1000 replications)CE plane and CEACLin et al. [88] methodYesNoNo 6
Prick et al. [65]NoNot reportedNo presentationYesNot reportedNo presentationMean imputationYesNoYes 2
Ramsey et al. [66]NoWilcoxon rank sum test p valueYesNot reportedNo presentationNo missing dataNoNoNo 2
Simon et al. [71]Yes T test with bootstrap (? replications95% CIYesNon-parametric bootstrap (? replications)CEAC and 95% CI for ICERMean imputationYesYesYes 5
Sjostrom et al. [72]YesUnclearNo presentationYesNot reportedNo presentationComplete-case analysis >5% missing dataNoNoNo 2
Ten Eikelder et al. [73]Yes T test with bootstrap (? replications)95% CIYesNon-parametric bootstrap (1000 replications)CE plane and CEACComplete-case analysis <5% missing dataYesYesYes 7
Van Baaren et al. [75]Yes T test with bootstrap (1000 replications)95% CIYesNon-parametric bootstrap (1000 replications)CE plane and CEACComplete-case analysis <5% missing dataYesNoYes 7
Van Baaren et al. [74]Yes T test with bootstrap (1000 replications)95% CINoNon-parametric bootstrap (1000 replications)CE plane (CEAC in appendix)Change of the perspective of the analysisYesNoYes 5
Vijgen et al. [76]Yes T test with bootstrap (1000 replications)95% CINoNon-parametric bootstrap (1000 replications)CE planeExtrapolationYesYesYes 4
Walker et al. [77]Yes T test with bootstrap (1000 replications)95% CIYesNon-parametric bootstrap (1000 replications)CE plane and CEACComplete-case analysis >5% missing dataYesNoNo 6
Bijen et al. [79]YesMann–Whitney test p valueYesNon-parametric bootstrap (5000 replications)CE plane and CEACComplete-case analysis <5% missing dataYesNoYes 6
Bogliolo et al. [37]NoMann–Whitney test p valueNoNot reportedNo presentationNot reportedNoNoNo 0
Dawes et al. [39]YesMann–Whitney test p valueNoNot reportedNo presentationComplete-case analysis <5% missing dataYesNoNo 3
El Hachem et al. [42] Yes T test or Mann–Whitney test p valueNoNot reportedNo presentationComplete-case analysis >5% missing dataNoNoNo 1
El-Sayed et al. [43]YesNot reportedNo presentationNoNot reportedNo presentationNot reportedNoNoNo 1
Eltabbakh et al. [44]Yes T test p valueNoNot reportedNo presentationNot reportedNoNoNo 1
Eltabbakh et al. [45]Yes T test p valueNoNot reportedNo presentationComplete-case analysis <5% missing dataNoNoNo 2
Evans [46]NoNot reportedNo presentationNoNot reportedNo presentationNot reportedNoNoNo 0
Fernandez et al. [47]YesNot reportedNo presentationYesNot reportedNo presentationNot reportedNoNoNo 2
Horowitz et al. [49]NoNot reportedNo presentationYesNot reportedNo presentationNo missing dataNoNoYes 3
Jack et al. [50]Yes T test with bootstrap (? replications)No presentationNoNon-parametric bootstrap (? replications)No presentationComplete-case analysis >5% missing dataNoNoNo 2
Kilonzo et al. [52]Yes T test with bootstrap (1000 replications)95% CINoNon-parametric bootstrap (1000 replications)CE plane and CEACComplete-case analysis >5% missing dataYesYesNo 5
Kovac [53]YesNot reportedNo presentationNoNot reportedNo presentationNot reportedNoNoNo 1
Lalchandani et al. [55]NoMann–Whitney test p valueNoNot reportedNo presentationNo missing dataNoNoNo 1
Lenihan et al. [56]NoANOVA (Kruskal-Wallis) p valueNoNot reportedNo presentationComplete-case analysis with >5% missing dataNoNoNo 0
Lumsden et al. [58]YesNot reported95% CINoNot reportedNo presentationComplete-case analysis <5% missing dataNoNoNo 3
Marino et al. [59]YesWilcoxon rank sum test p valueNoNot reportedNo presentationComplete-case analysis <5% missing dataYesNoNo 2
Palomba et al. [62]NoMann–Whitney test p valueNoNot reportedNo presentationComplete-case analysis <5% missing dataNoNoYes 2
Relph et al. [67]YesMann–Whitney testNo presentationNoNot reportedNo presentationNot reportedNoNoNo 1
Sarlos et al. [68]NoMann–Whitney test p valueNoNot reportedNo presentationNo missing dataNoNoNo 1
Sculpher et al. [69]Yes T test with bootstrap (1000 replications)95% CIYesNon-parametric bootstrap (1000 replications)CEACLin et al. [88] methodYesNoNo 5
Sculpher et al. [70]YesWilcoxon rank sum test p valueYesNot reportedNo presentationComplete-case analysis >5% missing data and LVCFYesNoNo 3
Yoong et al. [78]YesWilcoxon rank sum test p valueYesNot reportedNo presentationComplete-case analysis >5% missing data and LVCFYesNoNo 3

Compliance with statistical quality criteria: italic values: ≥75% of statistical quality items correct; bold values: 51–74% of statistical quality items correct; underlined values: 26–50% of statistical quality items correct; bold italic values: ≤25% of statistical quality items correct

CE plane cost-effectiveness plane, CEA cost-effectiveness analysis, CEAC cost-effectiveness acceptability curve, CUA cost-utility analysis, ICER incremental cost-effectiveness ratio, LVCF last value carried forward, NRS non-randomized study, RCT randomized controlled trial

Statistical approach of included studies Compliance with statistical quality criteria: italic values: ≥75% of statistical quality items correct; bold values: 51–74% of statistical quality items correct; underlined values: 26–50% of statistical quality items correct; bold italic values: ≤25% of statistical quality items correct CE plane cost-effectiveness plane, CEA cost-effectiveness analysis, CEAC cost-effectiveness acceptability curve, CUA cost-utility analysis, ICER incremental cost-effectiveness ratio, LVCF last value carried forward, NRS non-randomized study, RCT randomized controlled trial

Improvement in Quality Over Time

Exploratory analyses showed that the reporting and statistical quality score of studies in gynaecology did not significantly improve over time. However, the statistical quality and reporting quality scores in obstetric studies did significantly improve over time. Goodness-of-fit estimates showed that the amount of variance in quality scores explained by time was only limited (Table 5).
Table 5

Results from regression analysis for statistical quality

Reporting qualityStatistical quality
GynaecologyObstetricsGynaecologyObstetrics
β −0.0630.49−0.0240.24
95% confidence interval−0.40; 0.280.20; 0.78−0.15; 0.110.07; 0.42
p value0.700.0020.710.01
GOF statistic (R 2)0.0070.390.0070.29

β refers to a decrease or increase in the quality score per publication year. Quality score could range from 0 to 21 for reporting quality and from 0 to 8 for statistical quality. Publication year could range from 2000 to 2017

GOF goodness of fit

Results from regression analysis for statistical quality β refers to a decrease or increase in the quality score per publication year. Quality score could range from 0 to 21 for reporting quality and from 0 to 8 for statistical quality. Publication year could range from 2000 to 2017 GOF goodness of fit

Discussion

Main Findings

The majority of cost-effectiveness evaluations in obstetrics and gynaecology do not comply with current reporting guidelines and recommendations for statistical methods in trial-based cost-effectiveness evaluations. Furthermore, exploratory analyses indicated that there have not been significant improvements over time in reporting and statistical quality of trial-based cost-effectiveness evaluations in gynaecology. In obstetrics, the quality of reporting and analysis slightly improved over time.

Interpretation of the Findings

None of the included studies fully complied with the CHEERS statement’s reporting criteria [11] and the median reporting quality score of the included studies was relatively low (i.e. median 8, scale 0–21). This indicates that essential reporting components were missing, which can lead to faulty conclusions by researchers and healthcare decision makers. In particular, the failure to describe the setting in which the studies were performed (i.e. the place and setting in which the resource allocation decision needs to be made such as country, primary or secondary care and healthcare system) makes it difficult to assess the relevance or transferability of cost-effectiveness evaluation results [80]. None of the included studies fully complied with the statistical recommendations extracted from existing guidelines [12-14]. Various statistical pitfalls of the included studies are noteworthy. First, some studies presented an analysis based on median costs instead of mean costs, yet the median is a measure that is not easily interpretable or usable for healthcare decision makers [25, 81, 82]. Second, ICERs were only reported by less than half of the studies. Moreover, since ICERs have well known interpretation problems, reporting 95% confidence interval surrounding ICERs is not recommended [26, 28] and presentation of uncertainty using CE planes and/or CEA curves is preferred. Nonetheless, only a small number of studies adequately presented the statistical uncertainty around the ICERs. Last, one third of the included studies relied on naïve and outdated statistical techniques for dealing with missing data (e.g. mean imputation, last observation carried forward) rather than using more advanced and valid methods such as multiple imputation and maximum likelihood approaches [83, 84]. These shortcomings in the quality of the included studies may result in either under- or overestimated cost-effectiveness outcomes.

Strengths and Limitations

A strength of this review is the systematic way in which studies were included and assessed, increasing the validity of the review. Also, to the best of our knowledge, this is the first review that combined the assessment of reporting quality with a comprehensive and in-depth evaluation of the statistical methods based on up-to-date national and international recommendations. However, several limitations need to be mentioned as well. First, in order to keep this review manageable, we focused on trial-based cost-effectiveness evaluations in obstetrics and gynaecology. Further research is needed to assess whether these results are representative of trial-based cost-effectiveness evaluations in other clinical areas. Second, reviewers may have been subjective in their judgements of quality, because they were not blinded for authors, authors’ affiliations and journals. However, the quality assessments were done using objective criteria [11-14] by two independent reviewers. Third, considering the large developments in the methods of trial-based cost-effectiveness evaluations, early studies may be at a disadvantage. However, reporting guidelines have been available since 1996 [18, 85] and have not changed substantially since. Nonetheless, lower statistical quality scores may be the result of a lack of concrete, up-to-date statistical recommendations [86, 87]. Last, some of the included studies lacked transparency in how they designed and conducted their trial-based cost-effectiveness evaluations (i.e. poor reporting quality). This made it difficult to extract some of the data necessary to appropriately evaluate the quality of included studies, which affected the overall quality score negatively.

Comparison with the Literature

Our study adds to existing reviews in several ways. First, the majority of the previous reviews only assessed reporting quality and only a small number of reviews [8-10], which were conducted over a decade ago, evaluated the statistical quality of the included studies. Since then, however, statistical methods have improved considerably. Moreover, compared with previously conducted reviews in obstetrics and gynaecology, we performed an in-depth evaluation of the statistical methods. Regardless, results of this systematic review are in line with those of previously conducted reviews, which concluded that the reporting and quality of the statistical approach of trial-based cost-effectiveness evaluations are typically poor [4-7] [8, 9] [15, 16]. However, these earlier methodological reviews in the field of obstetrics and gynaecology concluded that their quality improved over the last decades. This is in contrast with our exploratory analyses, which only showed a significant quality improvement over time in obstetrics and not in gynaecology. This discrepancy may be explained by our strict assessment of quality based on the most up-to-date evidence. All in all, our review suggests that, even though various efforts have been made during the last decade to improve the reporting and statistical quality of trial-based cost-effectiveness evaluations, there is still substantial room for improvement in the area of obstetrics and gynaecology. Further research should indicate whether this applies to other medical disciplines as well.

Implications for Further Research and Practice

Future trial-based cost-effectiveness evaluations should increase their adherence to available guidelines and recommendations to improve their credibility. Up to now, however, no criteria list of statistical quality has been available. For this review, we developed a criteria list based on current evidence, but items were not weighed in terms of their opportunity cost; that is, the risk of taking the wrong decision. For example, failure to adequately handle missing data will affect the decisions more than evaluating cost differences using a Mann–Whitney U test. Therefore, we urgently recommend the development of a criteria list to assess statistical quality of trial-based cost-effectiveness evaluations including a weighing system that can be used by researchers, policy makers, reviewers and journal editors. Also, none of the most frequently used statistical software packages (e.g. SPSS, STATA, SAS, R) includes easy to use scripts for performing state-of-the-art trial-based cost-effectiveness evaluations. As such, authors are encouraged to (publicly) share their ‘advanced’ trial-based cost-effectiveness evaluations scripts.

Conclusion

This study indicated that the reporting and statistical quality of trial-based cost-effectiveness evaluations in obstetrics and gynaecology is generally poor. Since this can result in biased results, incorrect conclusions, and inappropriate healthcare decisions, there is an urgent need for improvement in the methods of cost-effectiveness evaluations in this field.

Data Availability Statement

The authors provide the readers of this article with a data extraction sheet in which information about all included studies is summarized. This file is added as electronic supplementary material. Below is the link to the electronic supplementary material. Supplementary material 1 (DOCX 19 kb) Supplementary material 2 (PDF 26 kb) Supplementary material 3 (DOCX 84 kb) Supplementary material 4 (XLSX 30 kb)
The quality of the statistical analysis and reporting of trial-based cost-effectiveness evaluations in obstetrics and gynaecology is poor with only a minority of studies presenting measures of statistical uncertainty around cost-effectiveness estimates.
Exploratory analyses indicated that there have been no significant improvements over time in reporting or statistical quality in gynaecology, whereas in obstetrics a significant improvement in reporting and statistical quality was found over time.
Improvement in reporting and statistical quality of trial-based cost-effectiveness evaluations is needed to ensure reliable results and conclusions as well as efficient allocation of scarce resources in healthcare.
  78 in total

1.  Cost-effectiveness in clinical trials: using multiple imputation to deal with incomplete cost data.

Authors:  Andrea Burton; Lucinda Jane Billingham; Stirling Bryan
Journal:  Clin Trials       Date:  2007       Impact factor: 2.486

2.  Cost-Effectiveness of Conventional vs Robotic-Assisted Laparoscopy in Gynecologic Oncologic Indications.

Authors:  Patricia Marino; Gilles Houvenaeghel; Fabrice Narducci; Agnès Boyer-Chammard; Gwenaël Ferron; Catherine Uzan; Anne-Sophie Bats; Philippe Mathevet; Philippe Dessogne; Frédéric Guyon; Philippe Rouanet; Isabelle Jaffre; Xavier Carcopino; Thomas Perez; Eric Lambaudie
Journal:  Int J Gynecol Cancer       Date:  2015-07       Impact factor: 3.437

3.  Consolidated Health Economic Evaluation Reporting Standards (CHEERS) statement.

Authors:  Don Husereau; Michael Drummond; Stavros Petrou; Chris Carswell; David Moher; Dan Greenberg; Federico Augustovski; Andrew H Briggs; Josephine Mauskopf; Elizabeth Loder
Journal:  Value Health       Date:  2013 Mar-Apr       Impact factor: 5.725

4.  A randomised controlled trial of microwave endometrial ablation without endometrial preparation in the outpatient setting: patient acceptability, treatment outcome and costs.

Authors:  Stuart A Jack; Kevin G Cooper; Janelle Seymour; Wendy Graham; Ann Fitzmaurice; Juan Perez
Journal:  BJOG       Date:  2005-08       Impact factor: 6.531

5.  University hospital-based prenatal care decreases the rate of preterm delivery and costs, when compared to managed care.

Authors:  J L Bienstock; S H Ural; K Blakemore; E K Pressman
Journal:  J Matern Fetal Med       Date:  2001-04

6.  Robotic hysterectomy versus conventional laparoscopic hysterectomy: outcome and cost analyses of a matched case-control study.

Authors:  Dimitri Sarlos; Lavonne Kots; Nebojsa Stevanovic; Gabriel Schaer
Journal:  Eur J Obstet Gynecol Reprod Biol       Date:  2010-03-05       Impact factor: 2.435

7.  Economic evaluation of three surgical interventions for menorrhagia.

Authors:  Hervé Fernandez; Giséla Kobelt; Amélie Gervaise
Journal:  Hum Reprod       Date:  2003-03       Impact factor: 6.918

8.  Cost-effectiveness of ritodrine and fenoterol for treatment of preterm labor in a low-middle-income country: a case study.

Authors:  Mihajlo Jakovljevic; Mirjana Varjacic; Slobodan M Jankovic
Journal:  Value Health       Date:  2008 Mar-Apr       Impact factor: 5.725

9.  Economic evaluation of alternative management methods of first-trimester miscarriage based on results from the MIST trial.

Authors:  S Petrou; J Trinder; P Brocklehurst; L Smith
Journal:  BJOG       Date:  2006-07-07       Impact factor: 6.531

10.  Trial-based economic evaluations in occupational health: principles, methods, and recommendations.

Authors:  Johanna M van Dongen; Marieke F van Wier; Emile Tompa; Paulien M Bongers; Allard J van der Beek; Maurits W van Tulder; Judith E Bosmans
Journal:  J Occup Environ Med       Date:  2014-06       Impact factor: 2.162

View more
  5 in total

1.  The statistical approach in trial-based economic evaluations matters: get your statistics together!

Authors:  Elizabeth N Mutubuki; Mohamed El Alili; Judith E Bosmans; Teddy Oosterhuis; Frank J Snoek; Raymond W J G Ostelo; Maurits W van Tulder; Johanna M van Dongen
Journal:  BMC Health Serv Res       Date:  2021-05-19       Impact factor: 2.655

2.  Taking the Analysis of Trial-Based Economic Evaluations to the Next Level: The Importance of Accounting for Clustering.

Authors:  Mohamed El Alili; Johanna M van Dongen; Keith S Goldfeld; Martijn W Heymans; Maurits W van Tulder; Judith E Bosmans
Journal:  Pharmacoeconomics       Date:  2020-11       Impact factor: 4.981

3.  Cost-effectiveness and return-on-investment of C-reactive protein point-of-care testing in comparison with usual care to reduce antibiotic prescribing for lower respiratory tract infections in nursing homes: a cluster randomised trial.

Authors:  Tjarda M Boere; Mohamed El Alili; Laura W van Buul; Rogier M Hopstaken; Theo J M Verheij; Cees M P M Hertogh; Maurits W van Tulder; Judith E Bosmans
Journal:  BMJ Open       Date:  2022-09-15       Impact factor: 3.006

4.  Economic evaluations of screening strategies for the early detection of colorectal cancer in the average-risk population: A systematic literature review.

Authors:  Joan Mendivil; Marilena Appierto; Susana Aceituno; Mercè Comas; Montserrat Rué
Journal:  PLoS One       Date:  2019-12-31       Impact factor: 3.240

5.  Costing the impact of interventions during pregnancy in the UK: a systematic review of economic evaluations.

Authors:  Sophie Relph; Louisa Delaney; Alexandra Melaugh; Matias C Vieira; Jane Sandall; Asma Khalil; Dharmintra Pasupathy; Andy Healey
Journal:  BMJ Open       Date:  2020-10-30       Impact factor: 2.692

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.