| Literature DB >> 34305250 |
Thomas J Zhou1, Sughra Raza2, Kerrie P Nelson1.
Abstract
Advances in breast imaging and other screening tests have prompted studies to evaluate and compare the consistency between experts' ratings of existing with new screening tests. In clinical settings, medical experts make subjective assessments of screening test results such as mammograms. Consistency between experts' ratings is evaluated by measures of inter-rater agreement or association. However, conventional measures, such as Cohen's and Fleiss' kappas, are unable to be applied or may perform poorly when studies consist of many experts, unbalanced data, or dependencies between experts' ratings exist. Here we assess the performance of existing approaches including recently developed summary measures for assessing the agreement between experts' binary and ordinal ratings when patients undergo two screening procedures. Methods to assess consistency between repeated measurements by the same experts are also described. We present applications to three large-scale clinical screening studies. Properties of these agreement measures are illustrated via simulation studies. Generally, a model-based approach provides several advantages over alternative methods including the ability to flexibly incorporate various measurement scales (i.e. binary or ordinal), large numbers of experts and patients, sparse data, and robustness to prevalence of underlying disease.Entities:
Keywords: Fleiss’ kappa; Intra-rater agreement; association; binary ratings; ordinal classifications; rater training
Year: 2020 PMID: 34305250 PMCID: PMC8299998 DOI: 10.1080/02664763.2020.1777394
Source DB: PubMed Journal: J Appl Stat ISSN: 0266-4763 Impact factor: 1.404