Aya A Mitani1, Phoebe E Freer2, Kerrie P Nelson3. 1. Department of Biostatistics, Boston University School of Public Health, Boston, MA. Electronic address: amitani@bu.edu. 2. Department of Radiology and Imaging Sciences, University of Utah Hospital and Huntsman Cancer Institute, Salt Lake City. 3. Department of Biostatistics, Boston University School of Public Health, Boston, MA.
Abstract
PURPOSE: Interpretation of screening tests such as mammograms usually require a radiologist's subjective visual assessment of images, often resulting in substantial discrepancies between radiologists' classifications of subjects' test results. In clinical screening studies to assess the strength of agreement between experts, multiple raters are often recruited to assess subjects' test results using an ordinal classification scale. However, using traditional measures of agreement in some studies is challenging because of the presence of many raters, the use of an ordinal classification scale, and unbalanced data. METHODS: We assess and compare the performances of existing measures of agreement and association as well as a newly developed model-based measure of agreement to three large-scale clinical screening studies involving many raters' ordinal classifications. We also conduct a simulation study to demonstrate the key properties of the summary measures. RESULTS: The assessment of agreement and association varied according to the choice of summary measure. Some measures were influenced by the underlying prevalence of disease and raters' marginal distributions and/or were limited in use to balanced data sets where every rater classifies every subject. Our simulation study indicated that popular measures of agreement and association are prone to underlying disease prevalence. CONCLUSIONS: Model-based measures provide a flexible approach for calculating agreement and association and are robust to missing and unbalanced data as well as the underlying disease prevalence.
PURPOSE: Interpretation of screening tests such as mammograms usually require a radiologist's subjective visual assessment of images, often resulting in substantial discrepancies between radiologists' classifications of subjects' test results. In clinical screening studies to assess the strength of agreement between experts, multiple raters are often recruited to assess subjects' test results using an ordinal classification scale. However, using traditional measures of agreement in some studies is challenging because of the presence of many raters, the use of an ordinal classification scale, and unbalanced data. METHODS: We assess and compare the performances of existing measures of agreement and association as well as a newly developed model-based measure of agreement to three large-scale clinical screening studies involving many raters' ordinal classifications. We also conduct a simulation study to demonstrate the key properties of the summary measures. RESULTS: The assessment of agreement and association varied according to the choice of summary measure. Some measures were influenced by the underlying prevalence of disease and raters' marginal distributions and/or were limited in use to balanced data sets where every rater classifies every subject. Our simulation study indicated that popular measures of agreement and association are prone to underlying disease prevalence. CONCLUSIONS: Model-based measures provide a flexible approach for calculating agreement and association and are robust to missing and unbalanced data as well as the underlying disease prevalence.
Authors: S Ciatto; N Houssami; A Apruzzese; E Bassetti; B Brancato; F Carozzi; S Catarzi; M P Lamberini; G Marcelli; R Pellizzoni; B Pesce; G Risso; F Russo; A Scorsolini Journal: Breast Date: 2005-08 Impact factor: 4.380
Authors: Charlotte C Gard; Erin J Aiello Bowles; Diana L Miglioretti; Stephen H Taplin; Carolyn M Rutter Journal: Breast J Date: 2015-07-01 Impact factor: 2.431
Authors: Tracy Onega; Megan Smith; Diana L Miglioretti; Patricia A Carney; Berta A Geller; Karla Kerlikowske; Diana S M Buist; Robert D Rosenberg; Robert A Smith; Edward A Sickles; Sebastien Haneuse; Melissa L Anderson; Bonnie Yankaskas Journal: J Am Coll Radiol Date: 2012-11 Impact factor: 5.532
Authors: Melike Pekmezci; Ramin A Morshed; Pranathi Chunduru; Balaji Pandian; Jacob Young; Javier E Villanueva-Meyer; Tarik Tihan; Emily A Sloan; Manish K Aghi; Annette M Molinaro; Mitchel S Berger; Shawn L Hervey-Jumper Journal: Sci Rep Date: 2021-06-09 Impact factor: 4.379
Authors: Willemijn A De Ridder; Yara E van Kooij; Guus M Vermeulen; Harm P Slijper; Ruud W Selles; Robbert M Wouters Journal: Clin Orthop Relat Res Date: 2021-09-01 Impact factor: 4.755
Authors: Esther H Lips; Jelle Wesseling; Maartje van Seijen; Katarzyna Jóźwiak; Sarah E Pinder; Allison Hall; Savitri Krishnamurthy; Jeremy Sj Thomas; Laura C Collins; Jonathan Bijron; Joost Bart; Danielle Cohen; Wen Ng; Ihssane Bouybayoune; Hilary Stobart; Jan Hudecek; Michael Schaapveld; Alastair Thompson Journal: J Pathol Clin Res Date: 2021-02-23
Authors: Mariela Fernández-Campos; Yu-Ting Huang; Mohammad R Jahanshahi; Tao Wang; Jian Jin; Darcy E P Telenko; Carlos Góngora-Canul; C D Cruz Journal: Front Plant Sci Date: 2021-06-17 Impact factor: 5.753
Authors: Devon J Boyne; Dylan E O'Sullivan; Emily V Heer; Robert J Hilsden; Tolulope T Sajobi; Winson Y Cheung; Darren R Brenner; Christine M Friedenreich Journal: Cancer Med Date: 2020-01-21 Impact factor: 4.452