Peter M Fayers1. 1. Department of Public Health, Institute of Applied Health Sciences, University of Aberdeen Medical School, Polwarth Building, Foresterhill, Aberdeen, AB25 2ZD, UK. p.fayers@abdn.ac.uk
Abstract
OBJECTIVES: We review the papers presented at the NCI/DIA conference, to identify areas of controversy and uncertainty, and to highlight those aspects of item response theory (IRT) and computer adaptive testing (CAT) that require theoretical or empirical research in order to justify their application to patient reported outcomes (PROs). BACKGROUND: IRT and CAT offer exciting potential for the development of a new generation of PRO instruments. However, most of the research into these techniques has been in non-healthcare settings, notably in education. Educational tests are very different from PRO instruments, and consequently problematic issues arise when adapting IRT and CAT to healthcare research. RESULTS: Clinical scales differ appreciably from educational tests, and symptoms have characteristics distinctly different from examination questions. This affects the transferring of IRT technology. Particular areas of concern when applying IRT to PROs include inadequate software, difficulties in selecting models and communicating results, insufficient testing of local independence and other assumptions, and a need of guidelines for estimating sample size requirements. Similar concerns apply to differential item functioning (DIF), which is an important application of IRT. Multidimensional IRT is likely to be advantageous only for closely related PRO dimensions. CONCLUSIONS: Although IRT and CAT provide appreciable potential benefits, there is a need for circumspection. Not all PRO scales are necessarily appropriate targets for this methodology. Traditional psychometric methods, and especially qualitative methods, continue to have an important role alongside IRT. Research should be funded to address the specific concerns that have been identified.
OBJECTIVES: We review the papers presented at the NCI/DIA conference, to identify areas of controversy and uncertainty, and to highlight those aspects of item response theory (IRT) and computer adaptive testing (CAT) that require theoretical or empirical research in order to justify their application to patient reported outcomes (PROs). BACKGROUND: IRT and CAT offer exciting potential for the development of a new generation of PRO instruments. However, most of the research into these techniques has been in non-healthcare settings, notably in education. Educational tests are very different from PRO instruments, and consequently problematic issues arise when adapting IRT and CAT to healthcare research. RESULTS: Clinical scales differ appreciably from educational tests, and symptoms have characteristics distinctly different from examination questions. This affects the transferring of IRT technology. Particular areas of concern when applying IRT to PROs include inadequate software, difficulties in selecting models and communicating results, insufficient testing of local independence and other assumptions, and a need of guidelines for estimating sample size requirements. Similar concerns apply to differential item functioning (DIF), which is an important application of IRT. Multidimensional IRT is likely to be advantageous only for closely related PRO dimensions. CONCLUSIONS: Although IRT and CAT provide appreciable potential benefits, there is a need for circumspection. Not all PRO scales are necessarily appropriate targets for this methodology. Traditional psychometric methods, and especially qualitative methods, continue to have an important role alongside IRT. Research should be funded to address the specific concerns that have been identified.
Authors: A G E M de Boer; J J B van Lanschot; P F M Stalmeier; J W van Sandick; J B F Hulscher; J C J M de Haes; M A G Sprangers Journal: Qual Life Res Date: 2004-03 Impact factor: 4.147
Authors: Morten Aa Petersen; Mogens Groenvold; Neil Aaronson; Peter Fayers; Mirjam Sprangers; Jakob B Bjorner Journal: Qual Life Res Date: 2006-04 Impact factor: 4.147
Authors: J Alonso; M C Angermeyer; S Bernert; R Bruffaerts; T S Brugha; H Bryson; G de Girolamo; R Graaf; K Demyttenaere; I Gasquet; J M Haro; S J Katz; R C Kessler; V Kovess; J P Lépine; J Ormel; G Polidori; L J Russo; G Vilagut; J Almansa; S Arbabzadeh-Bouchez; J Autonell; M Bernal; M A Buist-Bouwman; M Codony; A Domingo-Salvany; M Ferrer; S S Joo; M Martínez-Alonso; H Matschinger; F Mazzi; Z Morgan; P Morosini; C Palacín; B Romera; N Taub; W A M Vollebergh Journal: Acta Psychiatr Scand Suppl Date: 2004
Authors: Morten Aa Petersen; Mogens Groenvold; Neil K Aaronson; Wei-Chu Chie; Thierry Conroy; Anna Costantini; Peter Fayers; Jorunn Helbostad; Bernhard Holzner; Stein Kaasa; Susanne Singer; Galina Velikova; Teresa Young Journal: Qual Life Res Date: 2010-10-23 Impact factor: 4.147
Authors: H Felix Fischer; Inka Wahl; Sandra Nolte; Gregor Liegl; Elmar Brähler; Bernd Löwe; Matthias Rose Journal: Int J Methods Psychiatr Res Date: 2016-10-16 Impact factor: 4.035
Authors: Stephen M Haley; Pengsheng Ni; Alan M Jette; Wei Tao; Richard Moed; Doug Meyers; Larry H Ludlow Journal: Qual Life Res Date: 2009-03-14 Impact factor: 4.147
Authors: Morten Aa Petersen; Johannes M Giesinger; Bernhard Holzner; Juan I Arraras; Thierry Conroy; Eva-Maria Gamper; Madeleine T King; Irma M Verdonck-de Leeuw; Teresa Young; Mogens Groenvold Journal: Qual Life Res Date: 2013-02-28 Impact factor: 4.147