| Literature DB >> 16539705 |
Nynke Smidt1, Anne W S Rutjes, Daniëlle A W M van der Windt, Raymond W J G Ostelo, Patrick M Bossuyt, Johannes B Reitsma, Lex M Bouter, Henrica C w de Vet.
Abstract
BACKGROUND: In January 2003, STAndards for the Reporting of Diagnostic accuracy studies (STARD) were published in a number of journals, to improve the quality of reporting in diagnostic accuracy studies. We designed a study to investigate the inter-assessment reproducibility, and intra- and inter-observer reproducibility of the items in the STARD statement.Entities:
Mesh:
Year: 2006 PMID: 16539705 PMCID: PMC1522016 DOI: 10.1186/1471-2288-6-12
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Figure 1Overview of the design of the reproducibility study. * Papers were included in the pre-STARD evaluation, described elsewhere [18], † Four reviewers (AWSR, DAWMW, RWJGO, and HCWV) acted as second reviewer and each evaluated 8 articles. At the second assessment, the same reviewers evaluated the same studies, ‡ The first assessment was carried out together with the pre-STARD evaluation (March – May 2003), ¶The second assessment was carried out together with the post-STARD evaluation (January – March 2005).
Number of articles reported the items of the STARD statement at the first and second assessment and for each item the percentage agreement between the two assessments and kappa statistics of the two assessments.*
| First assessment | Second assessment | Inter-assessment agreement | Cohen's kappa | ||
| Item | n (%) | n (%) | n (%) | ||
| 1 | Identify the article as a study of diagnostic accuracy (recommend MeSH heading 'sensitivity and specificity'). | 3 (9) | 1 (3) | 94 | 0.48 |
| 2 | State the research questions or study aims, such as estimating diagnostic accuracy or comparing accuracy between tests or across participant groups. | 27 (84) | 31 (97) | 88 | 0.30 |
| 3 | The study population: The inclusion and exclusion criteria, setting and locations where data were collected. | 17 (53) | 10 (31) | 78 | 0.57 |
| 4 | Participant recruitment: Was recruitment based on presenting symptoms, results from previous tests, or the fact that the participants had received the index tests or the reference standard? | 28 (88) | 32 (100) | 88 | NA |
| 5 | Participant sampling: Was the study population a consecutive series of participants defined by the selection criteria in item 3 and 4? If not, specify how participants were further selected. | 20 (63) | 25 (78) | 84 | 0.64 |
| 6 | Data collection: Was data collection planned before the index test and reference standard were performed (prospective study) or after (retrospective study)? | 25 (78) | 26 (81) | 84 | 0.52 |
| 7 | The reference standard and its rationale. | 14 (44) | 14 (44) | 69 | 0.37 |
| 8 | Technical specifications of material and methods involved including how and when measurements were taken, and/or cite references for | ||||
| 9 | Definition of and rationale for the units, cut-offs and/or categories of the results of the | ||||
| 10 | The number, training and expertise of the persons executing and reading the | ||||
| 11 | Whether or not the readers of the | ||||
| 12 | Methods for calculating or comparing measures of diagnostic accuracy, and the statistical methods used to quantify uncertainty (e.g. 95% confidence intervals). | 4 (13) | 4 (13) | 94 | 0.71 |
| 13 | Methods for calculating test reproducibility, if done | ||||
| 14 | When study was performed, including beginning and end dates of recruitment. | 17 (53) | 17 (53) | 100 | 1.00 |
| 15 | Clinical and demographic characteristics of the study population (at least information on age, gender, spectrum of presenting symptoms). | 14 (44) | 16 (50) | 81 | 0.63 |
| 16 | The number of participants satisfying the criteria for inclusion who did or did not undergo the index tests and/or the reference standard, describe why participants failed to undergo either test (a flow diagram is strongly recommended). | 20 (63) | 19 (59) | 66 | 0.28 |
| 17 | Time-interval between the index tests and the reference standard, and any treatment administered in between. | 7 (22) | 9 (28) | 81 | 0.50 |
| 18 | Distribution of severity of disease (define criteria) in those with the target condition, other diagnoses in participants without the target condition. | 9 (28) | 15 (47) | 63 | 0.23 |
| 19 | A cross tabulation of the results of the index tests (including indeterminate and missing results) by the results of the reference standard, for continuous results, the distribution of the test results by the results of the reference standard. | 24 (75) | 24 (75) | 75 | 0.33 |
| 20 | Any adverse events from performing the index tests or the reference standard. | 5 (16) | 5 (16) | 100 | 1.00 |
| 21 | Estimates of diagnostic accuracy and measures of statistical uncertainty (e.g. 95% confidence intervals). | 13 (41) | 14 (44) | 91 | 0.81 |
| 22 | How indeterminate results, missing data and outliers of the index tests were handled. | 20 (63) | 21 (66) | 66 | 0.25 |
| 23 | Estimates of variability of diagnostic accuracy between subgroups of participants, readers or centers, if done. | 14 (44) | 17 (53) | 91 | 0.81 |
| 24 | Estimates of test reproducibility, if done. | ||||
| 25 | Discuss the clinical applicability of the study findings. | 31 (97) | 31 (97) | 94 | -0.032 |
* Data extraction form for assessing the 25 items of the STARD statement and the references of the 32 included articles are available on request of the first author; NA = not able to calculate.
Inter-assessment agreement: mean of first assessment and second assessment of the quality of reporting of diagnostic accuracy studies (n = 32), followed by mean differences between the two assessments, 95% limits of agreement, and smallest detectable difference (SDD).
| Outcome measure | First assessment (A) | Second assessment (B) | Difference B – A | 95% Limits of Agreement* | SDD† | Systematic differences | ||||
| MeanA (SDA) | RangeA | MeanB (SDB) | RangeB | |||||||
| Number of reported STARD items (0–25) | 12.08 (3.9) | 3.5 – 19 | 12.47 (3.4) | 7 – 19 | 0.39 (2.4) | -4.27, 5.05 | 4.66 | 0.39 (-0.4, 1.2) | ||
SD = Standard Deviation, * 95% Limits of Agreement: mean(1.96SDdiffAB, † SDD = Smallest Detectable Difference (SDD = 1.96 * SDdiffAB); ‡ systematic difference (bias) = MeandiffAB± 1.96 * SDdiffAB/√ n; 95%CI = 95% confidence intervals.
Figure 2Differences between first and second assessment for each article (n = 32), plotted against the mean value of both assessments for the total number of reported STARD items. Solid line: mean difference (0.39) between the two assessments, short striped lines: 95% Confidence Intervals (-0.4, 1.2) of systematic differences, long striped lines: 95% limits of agreement (-4.3, 5.0).