| Literature DB >> 32174875 |
Abstract
This paper provides a narrative review of empirical research on the assessment of speaking proficiency published in selected journals in the field of language assessment. A total of 104 published articles on speaking assessment were collected and systematically analyzed within an argument-based validation framework (Chapelle et al., 2008). We examined how the published research is represented in the six inferences of this framework, the topics that were covered by each article, and the research methods that were employed in collecting the backings to support the assumptions underlying each inference. Our analysis results revealed that: (a) most of the collected articles could be categorized into the three inferences of evaluation, generalization, and explanation; (b) the topics most frequently explored by speaking assessment researchers included the constructs of speaking ability, rater effects, and factors that affect spoken performance, among others; (c) quantitative methods were more frequently employed to interrogate the inferences of evaluation and generalization whereas qualitative methods were more frequently utilized to investigate the explanation inference. The paper concludes with a discussion of the implications of this study in relation to gaining a more nuanced understanding of task- or domain-specific speaking abilities, understanding speaking assessment in classroom contexts, and strengthening the interfaces between speaking assessment, and teaching and learning practices.Entities:
Keywords: argument-based validation framework; narrative review; research methods; speaking assessment; speaking proficiency
Year: 2020 PMID: 32174875 PMCID: PMC7057184 DOI: 10.3389/fpsyg.2020.00330
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1The argument-based validation framework (adapted from Chapelle et al., 2008, p. 18).
Figure 2PRISMA flowchart of article search and collection.
Representation of the published research in the six inferences (n = 104).
| • Domain description | 4 | 3.85 |
| • Evaluation | 42 | 40.38 |
| • Generalization | 42 | 40.38 |
| • Explanation | 50 | 48.08 |
| • Extrapolation | 7 | 6.73 |
| • Utilization | 5 | 4.81 |
Thirty-nine articles (37.50%) were coded into multiple inferences, of which 34 (32.69%) were coded into two inferences and five (4.81%) into three inferences.
Matrix coding results of inferences and speaking assessment topics (n = 104).
| (1) Speaking constructs ( | 0 | 0.00 | 8 | 7.69 | 11 | 10.58 | 39 | 37.50 | 5 | 4.81 | 0 | 0.00 |
| (2) Rater effects ( | 0 | 0.00 | 27 | 25.96 | 23 | 22.12 | 14 | 13.46 | 0 | 0.00 | 0 | 0.00 |
| (3) Factors that affect test performance ( | 0 | 0.00 | 9 | 8.65 | 19 | 18.27 | 13 | 12.50 | 0 | 0.00 | 0 | 0.00 |
| (4) Speaking test design ( | 2 | 1.92 | 9 | 8.65 | 4 | 3.85 | 8 | 7.69 | 1 | 0.96 | 0 | 0.00 |
| (5) Test score generalizability ( | 0 | 0.00 | 3 | 2.88 | 7 | 6.73 | 2 | 1.92 | 0 | 0.00 | 0 | 0.00 |
| (6) Rating scale evaluation ( | 2 | 1.92 | 4 | 3.85 | 2 | 1.92 | 2 | 1.92 | 0 | 0.00 | 0 | 0.00 |
| (7) Test use ( | 0 | 0.00 | 0 | 0.00 | 0 | 0.00 | 0 | 0.00 | 1 | 0.96 | 5 | 4.81 |
(1) The total number in the left column exceeds 104 because some articles were coded into multiple topic areas; (2) the total numbers of the rows exceed the numbers reported in the left column because some articles in these topic areas were coded into multiple inferences.
Matrix coding results of research methods and inferences (n = 104).
| QUAN ( | 0 | 0.00 | 21 | 20.19 | 27 | 25.96 | 18 | 17.31 | 3 | 2.88 | 1 | 0.96 |
| • ANOVA or regression ( | 0 | 0.00 | 13 | 12.50 | 14 | 13.46 | 15 | 14.42 | 2 | 1.92 | 3 | 2.88 |
| • Rasch ( | 0 | 0.00 | 19 | 18.27 | 20 | 19.23 | 9 | 8.65 | 0 | 0.00 | 0 | 0.00 |
| • Correlation ( | 1 | 0.96 | 7 | 6.73 | 9 | 8.65 | 10 | 9.62 | 4 | 3.85 | 1 | 0.96 |
| • G-theory ( | 0 | 0.00 | 4 | 3.85 | 7 | 6.73 | 2 | 1.92 | 0 | 0.00 | 0 | 0.00 |
| • EFA ( | 0 | 0.00 | 4 | 3.85 | 3 | 2.88 | 3 | 2.88 | 0 | 0.00 | 1 | 0.96 |
| • SEM ( | 0 | 0.00 | 2 | 1.92 | 3 | 2.88 | 2 | 1.92 | 1 | 0.96 | 0 | 0.00 |
| • Cluster analysis ( | 0 | 0.00 | 1 | 0.96 | 0 | 0.00 | 1 | 0.96 | 0 | 0.00 | 1 | 0.96 |
| QUAL ( | 3 | 2.88 | 4 | 3.85 | 3 | 2.88 | 16 | 15.38 | 2 | 1.92 | 0 | 0.00 |
| • Discourse analysis ( | 1 | 0.96 | 6 | 5.78 | 6 | 5.78 | 20 | 19.23 | 2 | 1.92 | 0 | 0.00 |
| • Interview/Focus group ( | 4 | 3.85 | 6 | 5.78 | 2 | 1.92 | 4 | 3.85 | 1 | 0.96 | 0 | 0.00 |
| • Written comments ( | 0 | 0.00 | 5 | 4.81 | 6 | 5.78 | 5 | 4.81 | 0 | 0.00 | 2 | 1.92 |
| • Verbal protocols ( | 1 | 0.96 | 7 | 6.73 | 2 | 1.92 | 5 | 4.81 | 0 | 0.00 | 0 | 0.00 |
| • Eye-tracking ( | 0 | 0.00 | 0 | 0.00 | 0 | 0.00 | 1 | 0.96 | 0 | 0.00 | 0 | 0.00 |
| MIXED ( | 1 | 0.96 | 17 | 16.35 | 12 | 11.53 | 16 | 15.38 | 2 | 1.92 | 4 | 3.85 |
(1) QUAL, Quantitative; QUAL, Qualitative; G-theory, Generalizability theory; EFA, Exploratory factor analysis; SEM, Structural equation modeling; (2) the total number in the left column exceeds 104 because some articles used multiple methods; (3) the total numbers of the rows exceed the numbers reported in the left column because some articles using these methods were coded into multiple inferences.