Martha Sajatovic1, Richa Gaur1, Curtis Tatsuoka1, Susan De Santi1, Nathan Lee1, Judith Laredo1, Sulabh Tripathi1. 1. Dr. Sajatovic, MD, Department of Psychiatry and of Neurology, Case Western Reserve University School of Medicine, Cleveland, OH. Dr. Gaur, PhD, Associate Director, Rater Training Services, PharmaNet/i3, Sydney, Australia. Dr. Tatsuoka, PhD, Department of Neurology, Case Western Reserve University School of Medicine, Cleveland, OH. Dr. De Santi, PhD, Department of Psychiatry, NYU Langone Medical Center, New York, NY, Global Medical Director, PET at G.E. Healthcare, Princeton, NJ. Mr. Lee, MSc, Director of Strategic Operations, The Cognition Group, London, UK. Dr. Laredo, PhD, Head of Department, Institut de Recherches Servier (IRIS), Suresnes, France. Dr. Tripathi, BDS, PGDHHM, PMP- Project Manager, The Cognition Group, New Delhi, India.
Abstract
AIMS: Given resource constraints in conducting clinical trials, it is critical that rater training focuses on scale items wherein standardization is most challenging. This analysis examined mood disorder symptom ratings submitted in an online rater training program conducted preparatory to the initiation of a multi-site, international mood disorder treatment trial. Ratings were entered online and analyzed for consistency and variability, and compared to established standards (Gold Consensus Ratings/ GCRs). METHODS: Raters participated in web-based rater training on the Hamilton Depression Rating Scale (HAM-D), Montgomery Asberg Rating Scale (MADRS), and Young Mania Rating Scale (YMRS). Training included integration of didactic materials and videos of two bipolar depressed patients interviewed by two U.S. clinicians. Raters viewed the videos and rated the mood scales. Inter-rater agreement was assessed using Kappa statistics. Ratings between the raters and the GCRs for individual scale items were assessed using McNemar test for paired binomial proportions. RESULTS: 194 raters from 16 countries, 80 sites and speaking 20 different languages participated. Interrater agreement on videos ratings ranged from substantial to moderate (HAM-D, Kappa video A = 0.72, video B = 0.65, p < 0.001), (MADRS, Kappa = 0.65 and 0.47, p < 0.001), (YMRS, Kappa = 0.75, and 0.64, p < 0.001). There was no significant difference on agreement based upon on English proficiency, clinical experience, or by country. Scale items that differed from the GCR on the HAM-D were depressed mood, delayed insomnia, retardation, and anxiety (psychic). Items that differed on the MADRS were apparent sadness, inner tension, concentration difficulties, lassitude and inability to feel. Items that differed on the YMRS were irritability and disruptive behavior. CONCLUSIONS: Identification of specific rating scale items in which rater variability is greatest may facilitate training approaches that target these areas for more efficient training in international clinical trials.
AIMS: Given resource constraints in conducting clinical trials, it is critical that rater training focuses on scale items wherein standardization is most challenging. This analysis examined mood disorder symptom ratings submitted in an online rater training program conducted preparatory to the initiation of a multi-site, international mood disorder treatment trial. Ratings were entered online and analyzed for consistency and variability, and compared to established standards (Gold Consensus Ratings/ GCRs). METHODS: Raters participated in web-based rater training on the Hamilton Depression Rating Scale (HAM-D), Montgomery Asberg Rating Scale (MADRS), and Young Mania Rating Scale (YMRS). Training included integration of didactic materials and videos of two bipolar depressedpatients interviewed by two U.S. clinicians. Raters viewed the videos and rated the mood scales. Inter-rater agreement was assessed using Kappa statistics. Ratings between the raters and the GCRs for individual scale items were assessed using McNemar test for paired binomial proportions. RESULTS: 194 raters from 16 countries, 80 sites and speaking 20 different languages participated. Interrater agreement on videos ratings ranged from substantial to moderate (HAM-D, Kappa video A = 0.72, video B = 0.65, p < 0.001), (MADRS, Kappa = 0.65 and 0.47, p < 0.001), (YMRS, Kappa = 0.75, and 0.64, p < 0.001). There was no significant difference on agreement based upon on English proficiency, clinical experience, or by country. Scale items that differed from the GCR on the HAM-D were depressed mood, delayed insomnia, retardation, and anxiety (psychic). Items that differed on the MADRS were apparent sadness, inner tension, concentration difficulties, lassitude and inability to feel. Items that differed on the YMRS were irritability and disruptive behavior. CONCLUSIONS: Identification of specific rating scale items in which rater variability is greatest may facilitate training approaches that target these areas for more efficient training in international clinical trials.
Authors: Elizabeth Jeglic; Kenneth A Kobak; Nina Engelhardt; Janet B W Williams; Joshua D Lipsitz; Donna Salvucci; Heather Bryson; Kevin Bellew Journal: Int Clin Psychopharmacol Date: 2007-07 Impact factor: 1.659
Authors: Seth W Glickman; John G McHutchison; Eric D Peterson; Charles B Cairns; Robert A Harrington; Robert M Califf; Kevin A Schulman Journal: N Engl J Med Date: 2009-02-19 Impact factor: 91.245
Authors: Jules Rosen; Benoit H Mulsant; Patricia Marino; Christopher Groening; Robert C Young; Debra Fox Journal: Psychiatry Res Date: 2008-08-30 Impact factor: 3.222
Authors: Joshua Lipsitz; Ken Kobak; Alan Feiger; Dawn Sikich; Georges Moroz; Ama Engelhard Journal: Psychiatry Res Date: 2004-06-30 Impact factor: 3.222