PURPOSE: Large samples are generally considered necessary for Rasch model to obtain robust item parameter estimates. Recently, small sample Rasch analysis was suggested as preliminary assessment of items' psychometric properties. This study is to evaluate the Rasch analysis results using small sample sizes. METHODS: Ten PROMIS pain behavior items were used. Random samples of 30, 50, 100, and 250, and a targeted sample of 30 were drawn 10 times each from a total of 800 subjects. Rasch analysis was conducted for each of these samples and the full sample. RESULTS: In the full sample, there were 104 cases of extreme scores, no null categories, two incorrectly ordered items, and four misfit items. For samples of 250, 100, 50, 30, and targeted 30, the average numbers of extreme scores were 42.2, 17.1, 9.6, 6.1, and 1.2; the average numbers of null categories were 1.0, 3.2, 8.7, 13.4, and 8.3; the average numbers of items with incorrectly ordered item parameters were 0.1, 0.8, 2.9, 4.7, and 3.7; and the average numbers of items with fit residuals exceeding ± 2.5 were 0.8, 0.3, 0.1, 0.2, and 0.3, respectively. CONCLUSIONS: Rasch analysis based on small samples (≤ 50) identified a greater number of items with incorrectly ordered parameters than larger samples (≥ 100). However, fewer items were identified as misfitting. Results from small samples led to opposite conclusions from those based on larger samples. Rasch analysis based on small samples should be used for exploratory purposes with extreme caution.
PURPOSE: Large samples are generally considered necessary for Rasch model to obtain robust item parameter estimates. Recently, small sample Rasch analysis was suggested as preliminary assessment of items' psychometric properties. This study is to evaluate the Rasch analysis results using small sample sizes. METHODS: Ten PROMIS pain behavior items were used. Random samples of 30, 50, 100, and 250, and a targeted sample of 30 were drawn 10 times each from a total of 800 subjects. Rasch analysis was conducted for each of these samples and the full sample. RESULTS: In the full sample, there were 104 cases of extreme scores, no null categories, two incorrectly ordered items, and four misfit items. For samples of 250, 100, 50, 30, and targeted 30, the average numbers of extreme scores were 42.2, 17.1, 9.6, 6.1, and 1.2; the average numbers of null categories were 1.0, 3.2, 8.7, 13.4, and 8.3; the average numbers of items with incorrectly ordered item parameters were 0.1, 0.8, 2.9, 4.7, and 3.7; and the average numbers of items with fit residuals exceeding ± 2.5 were 0.8, 0.3, 0.1, 0.2, and 0.3, respectively. CONCLUSIONS: Rasch analysis based on small samples (≤ 50) identified a greater number of items with incorrectly ordered parameters than larger samples (≥ 100). However, fewer items were identified as misfitting. Results from small samples led to opposite conclusions from those based on larger samples. Rasch analysis based on small samples should be used for exploratory purposes with extreme caution.
Authors: Ann C Klassen; John Creswell; Vicki L Plano Clark; Katherine Clegg Smith; Helen I Meissner Journal: Qual Life Res Date: 2012-02-04 Impact factor: 4.147
Authors: Dennis A Revicki; Wen-Hung Chen; Neesha Harnam; Karon F Cook; Dagmar Amtmann; Leigh F Callahan; Mark P Jensen; Francis J Keefe Journal: Pain Date: 2009-08-15 Impact factor: 6.961
Authors: Sylvie D Lambert; Julie F Pallant; Kerrie Clover; Benjamin Britton; Madeleine T King; Gregory Carter Journal: Qual Life Res Date: 2014-04-01 Impact factor: 4.147
Authors: Giovanni G Arcuri; Lisa Palladini; Gabrielle Dumas; Josée Lemoignan; Bruno Gagnon Journal: Support Care Cancer Date: 2015-02-14 Impact factor: 3.603
Authors: Tiê Parma Yamato; Chris G Maher; Bruno T Saragiotto; Mark J Catley; James H McAuley Journal: Eur Spine J Date: 2016-11-24 Impact factor: 3.134