Literature DB >> 15137911

Bias associated with delayed verification in test accuracy studies: accuracy of tests for endometrial hyperplasia may be much higher than we think!

T Justin Clark1, Gerben ter Riet, Aravinthan Coomarasamy, Khalid S Khan.   

Abstract

BACKGROUND: To empirically evaluate bias in estimation of accuracy associated with delay in verification of diagnosis among studies evaluating tests for predicting endometrial hyperplasia.
METHODS: Systematic reviews of all published research on accuracy of miniature endometrial biopsy and endometrial ultrasonography for diagnosing endometrial hyperplasia identified 27 test accuracy studies (2,982 subjects). Of these, 16 had immediate histological verification of diagnosis while 11 had verification delayed > 24 hrs after testing. The effect of delay in verification of diagnosis on estimates of accuracy was evaluated using meta-regression with diagnostic odds ratio (dOR) as the accuracy measure. This analysis was adjusted for study quality and type of test (miniature endometrial biopsy or endometrial ultrasound).
RESULTS: Compared to studies with immediate verification of diagnosis (dOR 67.2, 95% CI 21.7-208.8), those with delayed verification (dOR 16.2, 95% CI 8.6-30.5) underestimated the diagnostic accuracy by 74% (95% CI 7%-99%; P value = 0.048).
CONCLUSION: Among studies of miniature endometrial biopsy and endometrial ultrasound, diagnostic accuracy is considerably underestimated if there is a delay in histological verification of diagnosis.

Entities:  

Mesh:

Year:  2004        PMID: 15137911      PMCID: PMC419332          DOI: 10.1186/1741-7015-2-18

Source DB:  PubMed          Journal:  BMC Med        ISSN: 1741-7015            Impact factor:   8.775


Background

The natural history of endometrial hyperplasia is not fully understood [1]. What is known is that a proportion of simple and complex hyperplastic processes will regress without treatment [2] although the time scale over which such regression may occur is unclear. Similarly the time scale over which benign endometrium progresses to hyperplasia is also unknown. Among studies evaluating accuracy of tests for diagnosis of hyperplasia (miniature biopsy or ultrasonography), it has previously been hypothesised that if histological verification of diagnosis after performing the test is delayed, the estimation of test accuracy may be influenced by the phenomena of disease regression or progression [3]. For instance, false positive diagnoses of endometrial hyperplasia may occur due to natural disease regression during the time interval between testing and verification of diagnosis. Similarly, false negative diagnoses may also result from progression of benign functional or atrophic endometrium. To obtain accurate estimates of test accuracy in studies of hyperplasia, an immediate comparison of the test under scrutiny with a reference standard that verifies the diagnosis will be essential [4-6]. When accuracy studies suffer from a delay in performance of the reference standard, the resultant false positives and false negatives will be expected to lead to an underestimation of test accuracy. In systematic reviews, when studies of various designs are collated, the extent of underestimation that arises from delay is important in obtaining an unbiased pooled accuracy estimate. To our knowledge, the extent of underestimation of accuracy due to a delay in verification of diagnosis has not been evaluated empirically in studies of endometrial hyperplasia. We undertook this analysis to examine formally how inaccurate the estimation of accuracy can be in studies evaluating miniature endometrial biopsy devices and endometrial thickness measurement by pelvic ultrasonography for predicting endometrial hyperplasia when there are delays in histological verification of diagnosis.

Methods

To test our hypothesis, a data set of all the published studies reporting the accuracy of miniature endometrial biopsy devices and endometrial ultrasonography for predicting endometrial hyperplasia was obtained from systematic reviews [7,8]. The reviews focused on test accuracy studies in which the results of the test were compared with the results of a reference standard. The targeted population was women with abnormal pre- or postmenopausal uterine bleeding. The diagnostic tests of interest were miniature endometrial biopsy devices (for example, pipelle® endometrial suction curette, Unimar, Wilton, CT, USA) and endometrial thickness measurement by pelvic ultrasonography. The reference standard was endometrial histology obtained by an independent endometrial sampling technique, for example, inpatient curettage (with hysteroscopy) or hysterectomy.

Identification of studies

Two independent electronic searches of MEDLINE and EMBASE were conducted to identify relevant citations on endometrial biopsy (1980–1999) and ultrasonography (1966–2000). Search term combination for endometrial biopsy [8] was diagnosis (MeSH) AND endometrial biopsy (textword), while that for studies on ultrasonography [7] was ultrasound AND endometrial thickness AND sonography (textwords). The searches were limited to human studies, but there were no language restrictions. Relevant studies were identified by examining all the retrieved citations, reference lists of all known reviews and primary studies, and direct contact with manufacturers. Details of the search and selection processes can be found in the published reports of the reviews [7,8].

Study quality assessment

All selected studies were assessed for their methodological quality defined as the confidence that study design, conduct and analysis minimize bias in the estimation of diagnostic accuracy [9-11]. We considered the following features in quality assessment: method of recruitment of sample, appropriateness of patient spectrum, and blinding of comparison between test and reference standard. Recruitment was considered to be adequate if patient selection was consecutive or a random sample was obtained. Patient spectrum was considered to be appropriate if both pre- and postmenopausal women were included. Blinding was considered to be present if it was clearly reported that the pathologists providing histological reports were kept unaware of the results of miniature endometrial biopsy or endometrial ultrasonography. If the results of the diagnostic tests were divulged to the pathologists or in the absence of any such reporting, blinding was categorised as absent. For the purpose of our analysis, studies were classified into two quality categories: Category I studies had any one of the following features: adequate recruitment, appropriate spectrum, and blinding; category II studies had none of the above quality features.

Data extraction

In addition to assessment of methodological quality, data were extracted to allow classification of studies into one of two groups: i) immediate verification – reference standard performed within 24 hours of testing, and ii) delayed verification – reference standard performed more than 24 hours after testing. Any studies that could not be categorised in this way due to lack of reporting were excluded. Data were then abstracted as 2 × 2 tables and estimates of diagnostic accuracy were derived for each individual study. A correction factor of 0.5 was used when cells of the 2 × 2 tables included zero values [12]. True positive rates (sensitivity), false positive rates (1-specificity) and diagnostic odds ratios (dORs) were calculated for each primary evaluation. The dOR represents a ratio of the positive and negative likelihood ratios and it can be mathematically summarised as: dOR = [sensitivity/(1-specificity)] / [(1-sensitivity)/specificity]

Statistical analysis

Pooled dORs were generated as the principal measures of diagnostic accuracy. Meta-analyses to produce summary estimates of accuracy were performed separately for subgroups of studies reporting immediate and delayed verification. To delineate the impact of delay in verification of diagnosis, weused meta-regression analysis [13,14] with the log of dOR as the accuracy measure. This technique fitted a multivariable linear regression model for examining the influence of delay, quality and test type on the estimation of accuracy observed among studies included in the analysis (random effects model). In this way the analysis was adjusted for the confounding effects of study quality (two quality categories) and type of test (miniature endometrial biopsy or endometrial ultrasound).

Results

Selection of studies

The study selection process is shown in Figure 1. In total there were 2,982 subjects in 27 diagnostic evaluations reported in the 24 eligible primary studies. Eleven evaluations delayed verification of the diagnosis by more than 24 hours; the delay was up to six months in one study, up to four weeks in four studies, up to three weeks in one study and up to one week in the remaining three studies. Three of these studies were rated as category I for methodological quality, and eight as category II. Sixteen evaluations verified the diagnosis within 24 hours of the test. Among these, seven studies were rated as category I for quality, and nine as category II (Table 1).
Figure 1

Flow diagram showing study selection process.

Table 1

Study characteristics and methodological quality.

Bleeding type / Menopausal status (%)

Study (Year published)DetailsDelay (hours)Patient selectionPostHRTPre†OtherReference standardBlinding of resultsStudy quality level
IMMEDIATE VERIFICATION (</=24 HOURS) (16 studies)
Endometrial biopsy studies (8)
Sun-Kuie et al. [15] (1992)Gynoscann®Day beforeUnreported*5 (11)-41 (89)-D&CUnreportedI
Goldberg et al. [16] (1981)Accurette®ImmediateUnreported30 (100)--D&CUnreportedII
Sonnendecker et al. [17] (1981)Accurette®ImmediateUnreported*6 (24)-17 (76)-D&CUnreportedI
Kufahl et al. [18] (1997)Explora®ImmediateConsecutive*33 (21)-125 (79)-HysterectomyUnreportedI
Kufahl et al. [18] (1997)Gynoscann®ImmediateConsecutive*36 (21)-133 (79)-HysterectomyUnreportedI
Goldschmit et al. [19] (1993)Pipelle®ImmediateConsecutive*34 (23)-115 (77)-D&CYesI
Kavak et al. [20] (1996)Pipelle®ImmediateUnreported*34 (56)-27 (44)-D&CYesI
Goldberg et al. [16] (1981)Vabra Aspirator®ImmediateUnreported31 (100)---D&CUnreportedII
Ultrasound scan studies (8)
Botsis et al. [21] (1992)≤ 4 mm DLDay beforeUnreported120(100)---D&CUnreportedII
Garuti et al. [22] (1999)≤ 4 mm DLImmediateUnreported368 (88)51 (12)--D&CUnreportedI
Haller et al. [23](1996)≤ 4 mm DLDay beforeUnreported81 (100)---D&CUnreportedII
Grigoriou et al. [24] (1996)≤ 5 mm DLDay beforeUnreported250(100)---D&CUnreportedII
Karlsson et al. [25](1993)≤ 5 mm DLDay beforeUnreported103(100)---D&CUnreportedII
Malinova et al. [26] (1996)≤ 5 mm DLDay beforeUnreported154(100)---D&CUnreportedII
Wolman et al. [27] (1996)≤ 5 mm DLImmediateUnreported54 (100)---D&CUnreportedII
Malinova et al. [28] (1995)≤ 5 mm SLDay beforeUnreported118(100)---D&CUnreportedII
DELAYED VERIFICATION (> 24 HOURS) (11 studies)
Endometrial biopsy studies (4)
Stovall et al. [29] (1989)Novak Curette®UnreportedUnreported---165(100) NSHysterectomyUnreportedII
Krampl et al. [3] (1997)Pipelle®< 6 monthsUnreported37 (12)-247 (77)35 (11)TCRE/HysterectomyUnreportedI
Gupta et al. [30] (1996)Pipelle®< 1 monthUnreported54 (100)--D&CUnreportedII
Stovall et al. [29] (1989)Vabra Aspirator®UnreportedUnreported---62 (100) NSHysterectomyUnreportedII
Ultrasound scan studies (7)
Guner et al. [31](1996)≤ 4 mm DL</= 7 daysUnreported192(100)---D&CUnreportedII
Abu-Ghazzeh et al. [32] (1999)≤ 5 mm DL1 weekUnreported98 (100)---D&CUnreportedII
DeSilva et al. [33] (1997)≤ 5 mm DL< 4 weeksConsecutive44 (88)6 (12)--D&CUnreportedI
Gupta et al. [30] (1996)≤ 5 mm DL< 1 monthUnreported75 (100)---D&CYesI
Taviani et al. [34] (1995)≤ 5 mm DL1 weekUnreported41 (100)---D&CUnreportedII
Moreles et al. [35] (1998)≤ 6 mm DL< 3 weeksUnreported200(100)---D&CUnreportedII
Mortakis et al. [36] (1997)≤ 3 mm SL< 4 weeksUnreported78 (100)---D&CUnreportedII

*Numbers calculated from initial proportion of patients within these groups before missing outcome data was excluded. †Other refers to proportion of women included in the study who did not have abnormal uterine bleeding as an indication for investigation. HRT, hormone replacement therapy; NS, not specified (refers to proportion of women included in the study where the type of abnormal uterine bleeding was not specified); DB, directed biopsy; D&C, dilatation and curettage; TCRE, transcervical resection of the endometrium; DL, double layer endometrial thickness; SL, single layer endometrial thickness.

Table 2 shows the diagnostic accuracy results for individual studies according to test type and verification status in terms of delay. The summary statistics for the various subgroups showed that the dOR for studies with immediate verification was 67.2 (21.7–208.8) while that for studies with delayed verification was 16.2 (8.6–30.5) as shown in Figure 2. Meta-regression analysis for bias due to delay in verification of diagnosis, adjusted for study quality and test type, showed that the underestimation of test accuracy among studies with delayed verification was 74% (95% CI 7%-99%; P = 0.048) on average compared to studies with immediate verification (Table 3).
Table 2

Accuracy stratified by time delay between test performance and confirmation by chosen reference test histology.

Device (no. evaluations) & study (year published)+ve test (sensitivity)-ve test (1-specificity)Odds ratio (95% CI)
IMMEDIATE VERIFICATION (</= 24 HOURS) (16 studies)
Endometrial biopsy studies (8)
Sun-Kuie et al. [15](1992)2/4 (0.5)0/42 (0.0)85.0 (3.2–2289.6)
Goldberg et al. [16] (1981)5/5 (1.0)0/25 (0.0)561.0 (10.0–31463.1)
Sonnendecker et al. [17] (1981)2/2 (1.0)0/21 (0.0)215.0 (3.5–13408.5)
Kufahl et al. [18] (1997)14/15 (0.9)4/143 (0.03)486.5 (50.8–4658.5)
Kufahl et al. [18](1997)3/15 (0.2)15/154 (0.1)2.3 (0.6–9.1)
Goldschmit et al. [19] (1993)11/14 (0.8)1/135 (0.01)491.3 (47.1–5127.3)
Kavak et al. [20] (1996)4/4 (1.0)0/57 (0.0)1035.0 (18.3–58563.1)
Goldberg et al. [16] (1981)6/6 (1.0)0/25 (0.0)663.0 (12.0–36690.0)
Ultrasound scan studies (8)
Botsis et al. [21] (1992)10/10 (1.0)4/92 (0.04)413.0 (20.8–8221.1)
Garuti et al. [22] (1999)44/46 (0.96)196/313 (0.63)13.1 (3.1–55.2)
Haller et al. [23] (1996)14/16 (0.88)34/49 (0.69)3.1 (0.6–15.3)
Grigoriou et al. [24] (1996)45/45 (0.98)30/181 (0.17)452.0 (27.1–7538.5)
Karlsson et al. [25] (1993)10/10(1.0)21/78 (0.27)56.2 (3.2–1000.5)
Malinova et al. [26] (1996)11/11 (1.0)32/74 (0.43)30.1 (1.7–529.5)
Wolman et al. [27] (1996)10/11(0.91)12/39 (0.31)22.5 (2.6–196.1)
Malinova et al. [28] (1995)7/7 (1.0)19/54 (0.35)27.31 (1.5–504.1)
DELAYED VERIFICATION (> 24 HOURS) (11 studies)
Endometrial biopsy studies (4)
Stovall et al. [29] (1989)8/16 (0.5)4/149 (0.03)36.3 (9.0–146.3)
Krampl et al. [3] (1997)14/35 (0.4)24/284(0.0)7.2 (3.3–16.0)
Gupta et al. [30] (1996)6/10 (0.6)1/44 (0.02)64.5 (6.1–677.6)
Stovall et al. [29] (1989)7/7 (1.0)7/55 (0.13)97.0 (5.0–1879.9)
Ultrasound scan studies (7)
Guner et al. [31] (1996)31/31 (1.0)61/142 (0.43)83.5 (5.0–1391.3)
Abu-Ghazzeh et al. [32] (1999)2/2 (1.0)58/95 (0.61)3.2 (0.2–68.6)
DeSilva et al. [33] (1997)2/3 (0.67)10/44 (0.23)6.8 (0.6–83.0)
Gupta et al. [30] (1996)4/4 (1.0)22/68 (0.32)18.6 (1.0–360.7)
Taviani et al. [34] (1995)2/2 (1.0)16/37 (0.43)6.5 (0.3–145.1)
Moreles et al. [35] (1998)33/37 (0.9)37/143 (0.26)23.6 (7.8–71.2)
Mortakis et al. [36] (1997)4/5 (0.8)26/66 (0.39)6.2 (0.7–58.2)
Figure 2

Effect of delayed verification on the diagnostic accuracy of miniature endometrial biopsy and transvaginal ultrasound in detecting endometrial hyperplasia. Pooled diagnostic odds ratios (dOR) for studies with immediate and delayed verification.

Table 3

Results of meta-regression analysis.

Outcome Explanatory variablesUnivariable analysisMultivariable analysis
Ratio dOR (95% CI)P valueRatio dOR (95% CI)P value

Endometrial hyperplasia
▪ Delay in verification of diagnosis (> 24 hours vs. < 24 hours)0.31 (0.08–0.84)0.0890.26 (0.07–0.99)0.048
▪ Study quality (category II vs. category I)1.36 (0.34–5.53)0.6643.46 (0.79–15.0)0.098
▪ Test (Endometrial biopsy vs. ultrasound endometrial thickness)3.22 (0.83–12.55)0.0915.0 (1.2–20.7)0.027

†The dependent variable is the natural logarithm of the diagnostic odds ratio (dOR). Results are presented as the ratio of diagnostic odds ratios (RdOR); an RdOR < 1 means that the diagnostic accuracy is reduced and a RdOR > 1 means that it is increased in relation to the reference category, < 24 h, category I, and ultrasound endometrial thickness, respectively.

Discussion

Our study shows empirically the magnitude of bias associated with delay in verification of diagnosis in test accuracy studies. Delay in verification of more than 24 hours was associated with a considerable underestimation of accuracy of miniature biopsy and endometrial ultrasonography in diagnosing endometrial hyperplasia. This supports the premise that the reported limited accuracy of miniature endometrial biopsy devices and endometrial ultrasonography in diagnosing hyperplasia is due, in part, to natural history of disease rather than resulting entirely from intrinsic problems with performance of the diagnostic tools [3]. We posed our hypothesis a priori and tested it in as rigorous a manner as possible. Our literature search was without language restriction, facilitating retrieval of many relevant test accuracy studies. However, due to poverty of reporting many critical pieces of information were missing in the available literature, restricting the number of studies that could be included in our analysis (for example, 31 studies were ineligible for inclusion because explicit information about time before verification was omitted). Our examination of delays in verification was also restricted; just two time categories were discernible (delay < 24 hours or > 24 hours). Immediate verification (reference standard to be performed straight after the index test) was not achievable in some studies because the reference test (inpatient endometrial sampling) necessitated use of general anaesthesia. A practical cut-off of 24 hours was taken to allow time for reference testing to be undertaken when the preceding index tests (miniature endometrial biopsy and ultrasound) were performed in the conscious outpatient. Although the natural history of endometrial hyperplasia is unclear, it is unlikely that biological alteration would have occurred within 24 hours. To study the rate of disease progression or regression would require repeated testing over time, but such a study is unlikely to be ethically justifiable, given that most clinicians will institute treatment following initial diagnosis. Such a study would be then become one of prognosis under treatment rather than a natural history study. We also evaluated other features of methodological quality and, in general, found the quality of studies to be poor. For example, only three studies reported blinding interpretation of the reference test from knowledge of results from the index test. A lack of blinding can introduce bias and overestimation of diagnostic accuracy [4]. Pathological interpretation of endometrial hyperplasia is open to a varying degree of subjectivity especially at extreme ends of the spectrum, where overlap with benign functional endometrium (simple hyperplasia) and cancer (complex hyperplasia with cytological atypia) is more likely. Absence (or explicit reporting) of blinding is thus associated with poorer methodological quality and this feature was incorporated in our quality assessment. Our analysis adjusted for the confounding effects of quality but our inferences should be interpreted with caution due to relative scarcity of good quality studies.

Conclusions

Our findings have implications for research into new diagnostic interventions. Our results demonstrate that test evaluation with robust study design (immediate verification) showed good test performance but evaluation in poor designs (delayed verification) showed poor performance. Poor designs may reflect the situation prevalent in routine clinical practice where test results may not be immediately confirmed due to resource and other implications. Thus diagnostic evaluations carried out in routine practice may mask the accuracy of tests.

Competing interests

None declared.

Authors' contributions

TJC and KSK conceived and designed the study with input from GtR. TJC conducted the systematic review and acquired all data. AC conducted the statistical analyses with input from KSK and GtR. TJC wrote all versions of the manuscript. KSK and GtR critically revised the manuscript for important intellectual content. All authors read and approved the final manuscript.

Pre-publication history

The pre-publication history for this paper can be accessed here:
  28 in total

1.  Methods for exploring heterogeneity in meta-analysis.

Authors:  F Song; T A Sheldon; A J Sutton; K R Abrams; D R Jones
Journal:  Eval Health Prof       Date:  2001-06       Impact factor: 2.651

2.  Transvaginal ultrasonography and hysteroscopy in the diagnosis of endometrial abnormalities.

Authors:  A E Mortakis; K Mavrelos
Journal:  J Am Assoc Gynecol Laparosc       Date:  1997-08

3.  Transvaginal ultrasound, endometrial cytology sampled by Gynoscann and histology obtained by Uterine Explora Curette compared to the histology of the uterine specimen. A prospective study in pre- and postmenopausal women undergoing elective hysterectomy.

Authors:  J Kufahl; I Pedersen; P Sindberg Eriksen; P E Helkjaer; L G Larsen; K L Jensen; P de Nully; T Philipsen; A Wåhlin
Journal:  Acta Obstet Gynecol Scand       Date:  1997-09       Impact factor: 3.636

4.  Use of methodological standards in diagnostic test research. Getting better but still not good.

Authors:  M C Reid; M S Lachs; A R Feinstein
Journal:  JAMA       Date:  1995 Aug 23-30       Impact factor: 56.272

5.  The assessment of diagnostic tests. A survey of current medical research.

Authors:  S B Sheps; M T Schechter
Journal:  JAMA       Date:  1984-11-02       Impact factor: 56.272

6.  How should we investigate women with postmenopausal bleeding?

Authors:  J K Gupta; S Wilson; P Desai; C Hau
Journal:  Acta Obstet Gynecol Scand       Date:  1996-05       Impact factor: 3.636

Review 7.  Accuracy of outpatient endometrial biopsy in the diagnosis of endometrial hyperplasia.

Authors:  T J Clark; C H Mann; N Shah; K S Khan; F Song; J K Gupta
Journal:  Acta Obstet Gynecol Scand       Date:  2001-09       Impact factor: 3.636

8.  The behavior of endometrial hyperplasia. A long-term study of "untreated" hyperplasia in 170 patients.

Authors:  R J Kurman; P F Kaminski; H J Norris
Journal:  Cancer       Date:  1985-07-15       Impact factor: 6.860

9.  Endovaginal scanning of the endometrium compared to cytology and histology in women with postmenopausal bleeding.

Authors:  B Karlsson; S Granberg; M Wikland; W Ryd; A Norström
Journal:  Gynecol Oncol       Date:  1993-08       Impact factor: 5.482

10.  A comparison of endometrial sampling with the Accurette and Vabra aspirator and uterine curettage.

Authors:  G L Goldberg; G Tsalacopoulos; D A Davey
Journal:  S Afr Med J       Date:  1982-01-23
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.