| Literature DB >> 29075050 |
Grace Young-Suk Kim1, Christopher Schatschneider1, Jeanne Wanzek1, Brandy Gatlin1, Stephanie Al Otaiba1.
Abstract
We examined how raters and tasks influence measurement error in writing evaluation and how many raters and tasks are needed to reach a desirable level of .90 and .80 reliabilities for children in Grades 3 and 4. A total of 211 children (102 boys) were administered three tasks in narrative and expository genres, respectively, and their written compositions were evaluated in widely used evaluation methods for developing writers: holistic scoring, productivity, and curriculum-based writing scores. Results showed that 54% and 52% of variance in narrative and expository compositions were attributable to true individual differences in writing. Students' scores varied largely by tasks (30.44% and 28.61% of variance), but not by raters. To reach the reliability of .90, multiple tasks and raters were needed, and for the reliability of .80, a single rater and multiple tasks were needed. These findings offer important implications about reliably evaluating children's writing skills, given that writing is typically evaluated by a single task and a single rater in classrooms and even in state accountability systems.Entities:
Keywords: Generalizability theory; assessment; rater effect; task effect; writing
Year: 2017 PMID: 29075050 PMCID: PMC5653319 DOI: 10.1007/s11145-017-9724-6
Source DB: PubMed Journal: Read Writ ISSN: 0922-4777