Felix Fischer1, Brooke Levis2,3,4, Carl Falk5, Ying Sun2, John P A Ioannidis6, Pim Cuijpers7, Ian Shrier2,3,8, Andrea Benedetti3,9,10, Brett D Thombs2,3,5,10,11,12,13. 1. Department of Psychosomatic Medicine, Center for Internal Medicine and Dermatology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany. 2. Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, Québec, Canada. 3. Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montréal, Québec, Canada. 4. Centre for Prognosis Research, School of Primary, Community and Social Care, Keele University, Staffordshire, UK. 5. Department of Psychology, McGill University, Montréal, Québec, Canada. 6. Department of Medicine, Department of Epidemiology and Population Health, Department of Biomedical Data Science, Department of Statistics, Stanford University, Stanford, California, USA. 7. Department of Clinical, Neuro and Developmental Psychology, Amsterdam Public Health Research Institute, Vrije Universiteit, Amsterdam, the Netherlands. 8. Department of Family Medicine, McGill University, Montréal, Québec, Canada. 9. Respiratory Epidemiology and Clinical Research Unit, McGill University Health Centre, Montréal, Québec, Canada. 10. Department of Medicine, McGill University, Montréal, Québec, Canada. 11. Department of Psychiatry, McGill University, Montréal, Québec, Canada. 12. Department of Educational and Counselling Psychology, McGill University, Montréal, Québec, Canada. 13. Biomedical Ethics Unit, McGill University, Montréal, Québec, Canada.
Abstract
BACKGROUND: Previous research on the depression scale of the Patient Health Questionnaire (PHQ-9) has found that different latent factor models have maximized empirical measures of goodness-of-fit. The clinical relevance of these differences is unclear. We aimed to investigate whether depression screening accuracy may be improved by employing latent factor model-based scoring rather than sum scores. METHODS: We used an individual participant data meta-analysis (IPDMA) database compiled to assess the screening accuracy of the PHQ-9. We included studies that used the Structured Clinical Interview for DSM (SCID) as a reference standard and split those into calibration and validation datasets. In the calibration dataset, we estimated unidimensional, two-dimensional (separating cognitive/affective and somatic symptoms of depression), and bi-factor models, and the respective cut-offs to maximize combined sensitivity and specificity. In the validation dataset, we assessed the differences in (combined) sensitivity and specificity between the latent variable approaches and the optimal sum score (⩾10), using bootstrapping to estimate 95% confidence intervals for the differences. RESULTS: The calibration dataset included 24 studies (4378 participants, 652 major depression cases); the validation dataset 17 studies (4252 participants, 568 cases). In the validation dataset, optimal cut-offs of the unidimensional, two-dimensional, and bi-factor models had higher sensitivity (by 0.036, 0.050, 0.049 points, respectively) but lower specificity (0.017, 0.026, 0.019, respectively) compared to the sum score cut-off of ⩾10. CONCLUSIONS: In a comprehensive dataset of diagnostic studies, scoring using complex latent variable models do not improve screening accuracy of the PHQ-9 meaningfully as compared to the simple sum score approach.
BACKGROUND: Previous research on the depression scale of the Patient Health Questionnaire (PHQ-9) has found that different latent factor models have maximized empirical measures of goodness-of-fit. The clinical relevance of these differences is unclear. We aimed to investigate whether depression screening accuracy may be improved by employing latent factor model-based scoring rather than sum scores. METHODS: We used an individual participant data meta-analysis (IPDMA) database compiled to assess the screening accuracy of the PHQ-9. We included studies that used the Structured Clinical Interview for DSM (SCID) as a reference standard and split those into calibration and validation datasets. In the calibration dataset, we estimated unidimensional, two-dimensional (separating cognitive/affective and somatic symptoms of depression), and bi-factor models, and the respective cut-offs to maximize combined sensitivity and specificity. In the validation dataset, we assessed the differences in (combined) sensitivity and specificity between the latent variable approaches and the optimal sum score (⩾10), using bootstrapping to estimate 95% confidence intervals for the differences. RESULTS: The calibration dataset included 24 studies (4378 participants, 652 major depression cases); the validation dataset 17 studies (4252 participants, 568 cases). In the validation dataset, optimal cut-offs of the unidimensional, two-dimensional, and bi-factor models had higher sensitivity (by 0.036, 0.050, 0.049 points, respectively) but lower specificity (0.017, 0.026, 0.019, respectively) compared to the sum score cut-off of ⩾10. CONCLUSIONS: In a comprehensive dataset of diagnostic studies, scoring using complex latent variable models do not improve screening accuracy of the PHQ-9 meaningfully as compared to the simple sum score approach.
Authors: Margaret Sampson; Nicholas J Barrowman; David Moher; Terry P Klassen; Ba' Pham; Robert Platt; Philip D St John; Raymond Viola; Parminder Raina Journal: J Clin Epidemiol Date: 2003-10 Impact factor: 6.437
Authors: Lesley A Stewart; Mike Clarke; Maroeska Rovers; Richard D Riley; Mark Simmonds; Gavin Stewart; Jayne F Tierney Journal: JAMA Date: 2015-04-28 Impact factor: 56.272
Authors: Jay S Patel; Youngha Oh; Kevin L Rand; Wei Wu; Melissa A Cyders; Kurt Kroenke; Jesse C Stewart Journal: Depress Anxiety Date: 2019-07-29 Impact factor: 6.505
Authors: Joseph Chilcot; Joanna L Hudson; Rona Moss-Morris; Amy Carroll; David Game; Anna Simpson; Matthew Hotopf Journal: Gen Hosp Psychiatry Date: 2017-09-28 Impact factor: 3.238
Authors: Jafet Arrieta; Mercedes Aguerrebere; Giuseppe Raviola; Hugo Flores; Patrick Elliott; Azucena Espinosa; Andrea Reyes; Eduardo Ortiz-Panozo; Elena G Rodriguez-Gutierrez; Joia Mukherjee; Daniel Palazuelos; Molly F Franke Journal: J Clin Psychol Date: 2017-02-13
Authors: Joseph Chilcot; Lauren Rayner; William Lee; Annabel Price; Laura Goodwin; Barbara Monroe; Nigel Sykes; Penny Hansford; Matthew Hotopf Journal: J Psychosom Res Date: 2013-01-24 Impact factor: 3.006
Authors: Brooke Levis; Dean McMillan; Ying Sun; Chen He; Danielle B Rice; Ankur Krishnan; Yin Wu; Marleine Azar; Tatiana A Sanchez; Matthew J Chiovitti; Parash Mani Bhandari; Dipika Neupane; Nazanin Saadat; Kira E Riehm; Mahrukh Imran; Jill T Boruff; Pim Cuijpers; Simon Gilbody; John P A Ioannidis; Lorie A Kloda; Scott B Patten; Ian Shrier; Roy C Ziegelstein; Liane Comeau; Nicholas D Mitchell; Marcello Tonelli; Simone N Vigod; Franca Aceti; Rubén Alvarado; Cosme Alvarado-Esquivel; Muideen O Bakare; Jacqueline Barnes; Cheryl Tatano Beck; Carola Bindt; Philip M Boyce; Adomas Bunevicius; Tiago Castro E Couto; Linda H Chaudron; Humberto Correa; Felipe Pinheiro de Figueiredo; Valsamma Eapen; Michelle Fernandes; Barbara Figueiredo; Jane R W Fisher; Lluïsa Garcia-Esteve; Lisa Giardinelli; Nadine Helle; Louise M Howard; Dina Sami Khalifa; Jane Kohlhoff; Laima Kusminskas; Zoltán Kozinszky; Lorenzo Lelli; Angeliki A Leonardou; Beth A Lewis; Michael Maes; Valentina Meuti; Sandra Nakić Radoš; Purificación Navarro García; Daisuke Nishi; Daniel Okitundu Luwa E-Andjafono; Emma Robertson-Blackmore; Tamsen J Rochat; Heather J Rowe; Bonnie W M Siu; Alkistis Skalkidou; Alan Stein; Robert C Stewart; Kuan-Pin Su; Inger Sundström-Poromaa; Meri Tadinac; S Darius Tandon; Iva Tendais; Pavaani Thiagayson; Annamária Töreki; Anna Torres-Giménez; Thach D Tran; Kylee Trevillion; Katherine Turner; Johann M Vega-Dienstmaier; Karen Wynter; Kimberly A Yonkers; Andrea Benedetti; Brett D Thombs Journal: Int J Methods Psychiatr Res Date: 2019-09-30 Impact factor: 4.035