PURPOSE: Agreement across methods for identifying students as inadequate responders or as learning disabled is often poor. We report (1) an empirical examination of final status (post-intervention benchmarks) and dual-discrepancy growth methods based on growth during the intervention and final status for assessing response to intervention; and (2) a statistical simulation of psychometric issues that may explain low agreement. METHODS: After a Tier 2 intervention, final status benchmark criteria were used to identify 104 inadequate and 85 adequate responders to intervention, with comparisons of agreement and coverage for these methods and a dual-discrepancy method. Factors affecting agreement were investigated using computer simulation to manipulate reliability, the intercorrelation between measures, cut points, normative samples, and sample size. RESULTS: Identification of inadequate responders based on individual measures showed that single measures tended not to identify many members of the pool of 104 inadequate responders. Poor to fair levels of agreement for identifying inadequate responders were apparent between pairs of measures In the simulation, comparisons across two simulated measures generated indices of agreement (kappa) that were generally low because of multiple psychometric issues inherent in any test. CONCLUSIONS: Expecting excellent agreement between two correlated tests with even small amounts of unreliability may not be realistic. Assessing outcomes based on multiple measures, such as level of CBM performance and short norm-referenced assessments of fluency may improve the reliability of diagnostic decisions.
PURPOSE: Agreement across methods for identifying students as inadequate responders or as learning disabled is often poor. We report (1) an empirical examination of final status (post-intervention benchmarks) and dual-discrepancy growth methods based on growth during the intervention and final status for assessing response to intervention; and (2) a statistical simulation of psychometric issues that may explain low agreement. METHODS: After a Tier 2 intervention, final status benchmark criteria were used to identify 104 inadequate and 85 adequate responders to intervention, with comparisons of agreement and coverage for these methods and a dual-discrepancy method. Factors affecting agreement were investigated using computer simulation to manipulate reliability, the intercorrelation between measures, cut points, normative samples, and sample size. RESULTS: Identification of inadequate responders based on individual measures showed that single measures tended not to identify many members of the pool of 104 inadequate responders. Poor to fair levels of agreement for identifying inadequate responders were apparent between pairs of measures In the simulation, comparisons across two simulated measures generated indices of agreement (kappa) that were generally low because of multiple psychometric issues inherent in any test. CONCLUSIONS: Expecting excellent agreement between two correlated tests with even small amounts of unreliability may not be realistic. Assessing outcomes based on multiple measures, such as level of CBM performance and short norm-referenced assessments of fluency may improve the reliability of diagnostic decisions.
Authors: David J Francis; Jack M Fletcher; Karla K Stuebing; G Reid Lyon; Bennett A Shaywitz; Sally E Shaywitz Journal: J Learn Disabil Date: 2005 Mar-Apr
Authors: Carolyn A Denton; Paul T Cirino; Amy E Barth; Melissa Romain; Sharon Vaughn; Jade Wexler; David J Francis; Jack M Fletcher Journal: J Res Educ Eff Date: 2011-01-01
Authors: Amy E Barth; Karla K Stuebing; Jason L Anthony; Carolyn A Denton; Patricia G Mathes; Jack M Fletcher; David J Francis Journal: Learn Individ Differ Date: 2008-09
Authors: Jack M Fletcher; Karla K Stuebing; Amy E Barth; Carolyn A Denton; Paul T Cirino; David J Francis; Sharon Vaughn Journal: School Psych Rev Date: 2011