Literature DB >> 18486446

Reliability studies of diagnostic tests are not using enough observers for robust estimation of interobserver agreement: a simulation study.

Mohsen Sadatsafavi1, Mehdi Najafzadeh, Larry Lynd, Carlo Marra.   

Abstract

OBJECTIVE: Any attempt to generalize the performance of a subjective diagnostic method should take into account the sample variation in both cases and readers. Most current measures of the performance of a test, especially the indices of reliability, only tackle the variation of cases, and hence are not suitable for generalizing results across the population of readers. We attempted to study the effect of readers' variation on two measures of multireader reliability: pair-wise agreement and Fleiss' kappa. STUDY DESIGN AND
SETTING: We used a normal hierarchical model with a latent trait (signal) variable to simulate a binary decision-making task by different number of readers on an infinite sample of cases.
RESULTS: It could be shown that both measures, especially Fleiss' kappa, have a large sample variance when estimated by a small number of readers, casting doubt on their accuracy given the number of readers typically used in current reliability studies.
CONCLUSION: The majority of the current agreement studies is likely limited by the number of readers and is unlikely to produce a reliable estimate of reader agreement.

Mesh:

Year:  2008        PMID: 18486446     DOI: 10.1016/j.jclinepi.2007.10.023

Source DB:  PubMed          Journal:  J Clin Epidemiol        ISSN: 0895-4356            Impact factor:   6.437


  7 in total

1.  Common radiographic imaging modalities fail to accurately predict capitate morphology.

Authors:  Timothy Niacaris; Victor W Wong; Ketan M Patel; Michael Januszyk; Trevor Starnes; Michael S Murphy; James P Higgins
Journal:  Hand (N Y)       Date:  2015-09

2.  Tomosynthesis of the thoracic spine: added value in diagnosing vertebral fractures in the elderly.

Authors:  Mats Geijer; Eirikur Gunnlaugsson; Simon Götestrand; Lars Weber; Håkan Geijer
Journal:  Eur Radiol       Date:  2016-05-31       Impact factor: 5.315

3.  A multicenter pilot evaluation of the National Institutes of Health chronic graft-versus-host disease (cGVHD) therapeutic response measures: feasibility, interrater reliability, and minimum detectable change.

Authors:  Sandra A Mitchell; David Jacobsohn; Kimberly E Thormann Powers; Paul A Carpenter; Mary E D Flowers; Edward W Cowen; Mark Schubert; Maria L Turner; Stephanie J Lee; Paul Martin; Michael R Bishop; Kristin Baird; Javier Bolaños-Meade; Kevin Boyd; Jane M Fall-Dickson; Lynn H Gerber; Jean-Pierre Guadagnini; Matin Imanguli; Michael C Krumlauf; Leslie Lawley; Li Li; Bryce B Reeve; Janine Austin Clayton; Georgia B Vogelsang; Steven Z Pavletic
Journal:  Biol Blood Marrow Transplant       Date:  2011-04-12       Impact factor: 5.742

4.  A Tool to Assess the Signs and Symptoms of Catheter-Associated Urinary Tract Infection: Development and Reliability.

Authors:  Tom J Blodgett; Sue E Gardner; Nicole P Blodgett; Lisa V Peterson; Melissa Pietraszak
Journal:  Clin Nurs Res       Date:  2014-09-22       Impact factor: 2.075

5.  LAMP for human African trypanosomiasis: a comparative study of detection formats.

Authors:  Sally L Wastling; Kim Picozzi; Abbas S L Kakembo; Susan C Welburn
Journal:  PLoS Negl Trop Dis       Date:  2010-11-02

6.  Ultra-widefield fundus autofluorescence in age-related macular degeneration.

Authors:  Abhilash Guduru; David Fleischman; Sunyoung Shin; Donglin Zeng; James B Baldwin; Odette M Houghton; Emil A Say
Journal:  PLoS One       Date:  2017-06-01       Impact factor: 3.240

7.  Development and pilot testing of a tool to assess evidence-based practice skills among French general practitioners.

Authors:  Nicolas Rousselot; Thomas Tombrey; Drissa Zongo; Evelyne Mouillet; Jean-Philippe Joseph; Bernard Gay; Louis Rachid Salmi
Journal:  BMC Med Educ       Date:  2018-11-09       Impact factor: 2.463

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.