Literature DB >> 33151810

A Tool for Automatic Scoring of Spelling Performance.

Charalambos Themistocleous1, Kyriaki Neophytou2, Brenda Rapp2,3,4, Kyrana Tsapkini1,2.   

Abstract

Purpose The evaluation of spelling performance in aphasia reveals deficits in written language and can facilitate the design of targeted writing treatments. Nevertheless, manual scoring of spelling performance is time-consuming, laborious, and error prone. We propose a novel method based on the use of distance metrics to automatically score spelling. This study compares six automatic distance metrics to identify the metric that best corresponds to the gold standard-manual scoring-using data from manually obtained spelling scores from individuals with primary progressive aphasia. Method Three thousand five hundred forty word and nonword spelling productions from 42 individuals with primary progressive aphasia were scored manually. The gold standard-the manual scores-were compared to scores from six automated distance metrics: sequence matcher ratio, Damerau-Levenshtein distance, normalized Damerau-Levenshtein distance, Jaccard distance, Masi distance, and Jaro-Winkler similarity distance. We evaluated each distance metric based on its correlation with the manual spelling score. Results All automatic distance scores had high correlation with the manual method for both words and nonwords. The normalized Damerau-Levenshtein distance provided the highest correlation with the manual scoring for both words (rs = .99) and nonwords (rs = .95). Conclusions The high correlation between the automated and manual methods suggests that automatic spelling scoring constitutes a quick and objective approach that can reliably substitute the existing manual and time-consuming spelling scoring process, an important asset for both researchers and clinicians.

Entities:  

Mesh:

Year:  2020        PMID: 33151810      PMCID: PMC8608207          DOI: 10.1044/2020_JSLHR-20-00177

Source DB:  PubMed          Journal:  J Speech Lang Hear Res        ISSN: 1092-4388            Impact factor:   2.297


The evaluation and remediation of spelling (written language production) plays an important role in language therapy. Research on poststroke dysgraphia (Buchwald & Rapp, 2004; Caramazza & Miceli, 1990) and on neurodegenerative conditions, such as primary progressive aphasia (PPA), has shown effects of brain damage on underlying cognitive processes related to spelling (Rapp & Fischer-Baum, 2015). For example, spelling data have been shown to facilitate reliable subtyping of PPA into its variants (Neophytou et al., 2019), identify underlying language/cognitive deficits (Neophytou et al., 2019; Sepelyak et al., 2011), monitor the progression of the neurodegenerative condition over time, inform treatment decisions (Fenner et al., 2019), and reliably quantify the effect of spelling treatments (Rapp & Kane, 2002; Tsapkini et al., 2014; Tsapkini & Hillis, 2013). For spelling treatment and evaluation, spelling-to-dictation tasks are included in language batteries, such as the Johns Hopkins University Dysgraphia Battery (Goodman & Caramazza, 1985) and the Arizona Battery for Reading and Spelling (Beeson et al., 2010). These evaluations can identify the cognitive processes involved in the spelling of both real words and nonwords (pseudowords). Spelling of real words involves access to the speech sounds and to lexicosemantic/orthographic representations stored in long-term memory, whereas nonword spelling requires only the learned knowledge about the relationship between sounds and letters to generate plausible spellings (phonology-to-orthography conversion; Tainturier & Rapp, 2001). However, the task of scoring spelling errors manually is exceptionally time-consuming, laborious, and error prone. In this research note, we propose to apply automated distance metrics commonly employed in string comparison for the scoring of spelling of both regular words (i.e., words with existing orthography) and nonwords (i.e., words without existing orthography). We used manually scored spelling data for individuals with PPA to evaluate the distance metrics as a tool for assessing spelling performance. The ultimate goal of this work is to provide a tool to clinicians and researchers for automatic spelling evaluation of individuals with spelling disorders, such as PPA and stroke dysgraphia.

Spelling Performance Evaluation: Current Practices

Manual scoring of spelling responses is currently a time-consuming process. The spelling evaluation proposal by Caramazza and Miceli (1990) involves the comparison of an individual's spelling response with the standard spelling of that word, letter by letter. The comparison is based on a set of rules, which consider the addition of new letters that do not exist in the target word, the substitution of letters with others, the deletion of letters, and the movement of letters to incorrect positions within words. There is also a set of rules that account for double letters, such as deleting, moving, substituting, or simplifying a double letter, or doubling what should be a single letter. According to this scoring approach, each letter in the target word is worth 1 point. If the individual's response includes changes such as those listed above, a specified number of points are subtracted from the overall score of the word. For example, if the target word is “cat,” the maximum number of points is 3. If the patient's response is CAP, the word will be scored with 2 out of 3 points because of the substitution of “T” with “P.” The process applies slightly differently in words and nonwords, given that for nonwords there are multiple possible correct spellings. For instance, for “foit,” both PHOIT and FOIT are plausible spellings and, therefore, should be considered correct. In one approach to scoring nonword spelling, the scorer considers each response separately and selects as the “target” response the option that would maximize the points for that response as long as there is adherence to the phoneme-to-grapheme correspondence rules. Following the example above, if a participant is asked to spell “foit” and they write PHOAT, PHOIT would be chosen as the “intended” target and not FOIT, because PHOIT would assign 4 out of 5 points (i.e., substitute “I” with “A”), while assuming FOIT as the target would only assign 2 out of 4 points (i.e., substitute “I” with “A” and “PH” with “F”). This process assumes that even if two participants get the same nonwords, the target orthographic forms might be different across participants (depending on their responses), and therefore, the total possible points for each nonword might be different across participants. Clearly, when there is more than one error in a response, nonword scoring depends on the clinician's assumptions about the assumed target word, making the process extremely complex. The manual spelling evaluation of spelling performance is currently the gold standard of spelling evaluation, but it is often error prone, takes a lot of time, and requires high interrater reliability scores from at least two clinicians to ensure consistency. Attempts to automatically evaluate spelling performance have been proposed before. For instance, McCloskey et al. (1994) developed a computer program to identify the different types of letter errors, namely, deletions, insertions, substitutions, transpositions, and nonidentifiable errors. The study has mostly focused on identifying error types rather than error scores. More recently, Ross et al. (2019) devised a hierarchical scoring system that simultaneously evaluates both lexical and sublexical processing. More specifically, this system identifies both lexical and sublexical parameters that are believed to have shaped a given response, and these parameters are matched to certain scores. Scores are coded as S0–S9 and L0–L9 for the sublexical and lexical systems, respectively. Each response gets an S and an L score. This system was first constructed manually and was then automatized with a set of scripts. Undoubtedly, both of these studies provide valuable tools in qualitatively assessing the cognitive measures that underly spelling but rely on complex rules and do not offer a single spelling score for every response, which is the measure that most clinicians need in their everyday practice. Other studies have used computational connectionist models to describe the underlying cognitive processes of spelling production (Brown & Loosemore, 1994; Bullinaria, 1997; Houghton & Zorzi, 1998, 2003; Olson & Caramazza, 1994), which are often inspired by related research on reading (Seidenberg et al., 1984; Sejnowski & Rosenberg, 1987). These approaches aim to model the functioning of the human brain through the representation of spelling by interconnected networks of simple units and their connections. Although some of this work involves modeling acquired dysgraphia, the aim of these models is not to score spelling errors but rather to determine the cognitive processes that underlie spelling. To the best of our knowledge, there have been no attempts to provide automated single response accuracy values.

Alternative Approaches Using “Distance Functions”

The comparison of different strings and the evaluation of their corresponding differences are commonly carried out using distance metrics, also known as “string similarity metrics” or “string distance functions” (Jurafsky & Martin, 2009). A distance is a metric that measures how close two elements are, where elements can be letters, characters, numbers, and more complex structures such as tables. Such metrics measure the minimum number of alternations, such as insertions, deletions, substitutions, and so on, required to make the two strings identical. For example, to make a string of letters such as “grapheme” and “graphemes” identical, you need to delete the last “s” from “graphemes,” so their distance is 1; to make “grapheme” and “krapheme” identical, you need to substitute “k” with “g”; again, the distance is 1. In many ways, the automatic approach described above is very similar to the manual approach currently being employed for the calculation of spelling, making automatic distance metrics exceptionally suitable for automating spelling evaluation. Commonly employed measures are as follows: sequence matcher ratio, Damerau–Levenshtein distance, normalized Damerau–Levenshtein distance, Jaccard distance, Masi distance, and Jaro–Winkler similarity distance. These measures have many applications in language research, biology (such as in DNA and RNA analysis), and data mining (Bisani & Ney, 2008; Damper & Eastmond, 1997; Ferragne & Pellegrino, 2010; Gillot et al., 2010; Hathout, 2014; Heeringa et al., 2009; Hixon et al., 2011; Jelinek, 1996; Kaiser et al., 2002; Navarro, 2001; Peng et al., 2011; Riches et al., 2011; Schlippe et al., 2010; Schlüter et al., 2010; Spruit et al., 2009; Tang & van Heuven, 2009; Wieling et al., 2012). In a study by Smith et al. (2019), the phonemic edit distance ratio, which is an automatic distance function, was employed to estimate error frequency analysis for evaluating the speech production of individuals with acquired language disorders, such as apraxia of speech and aphasia with phonemic paraphasia, highlighting the efficacy of distance metrics in automating manual measures in the context of language pathology.

The Current Study

The aim of this research note is to propose an automatic spelling scoring methodology that employs distance metrics to generate spelling scores for both real-word and nonword spellings. Therefore, we compared scoring based on the manual spelling scoring method to six established distance metrics: (a) sequence matcher ratio, (b) Damerau–Levenshtein distance, (c) normalized Damerau–Levenshtein distance, (d) Jaccard distance, (e) Masi distance, and (f) Jaro–Winkler similarity distance (see Appendix A for more details). We have selected these methods because they have the potential to provide results that can automate the manual method for scoring of spelling errors using conceptually different approaches. We selected two types of distance metrics in this research note: those that treat words as sets of letters and those that treat words as strings of letters. The sequence matcher ratio (a.k.a. gestalt pattern matching), the Jaccard distance, and the Masi distance compare sets and employ set theory to calculate distance; in a set, a letter can appear only once. On the other hand, the Damerau–Levenshtein distance and the normalized Damerau–Levenshtein distance treat words as strings and are estimating the movements of letters, namely, the insertions, deletions, and substitutions required to make two strings equal. The only difference between these two metrics is that the normalized Damerau–Levenshtein distance calculates transpositions as well. Finally, the Jaro–Winkler similarity distance is a method similar to the Levenshtein distance, but it gives more favorable scores to strings that match from the beginning of the word (see Appendix A for details). By comparing the outcomes of these metrics to the manual spelling scoring, this study aims to identify the metric that best matches the manual scoring and can therefore be employed to automatically evaluate the spelling of both real words and nonwords. The automated metrics can be employed in the clinic to facilitate spelling evaluation and provide a quantitative approach to spelling scoring that would greatly improve not only speed and efficiency but also consistency relative to current practice.

Method

Participants

Forty-two patients with PPA were administered a test of spelling-to-dictation with both words and nonwords. The patients were recruited over a period of 5 years as part of a clinical trial on the effects of transcranial direct current stimulation in PPA (ClinicalTrials.gov Identifier: NCT02606422). The data evaluated here were obtained from the evaluation phase preceding any treatment. All patients are subtyped into the three PPA variants following the consensus criteria by Gorno-Tempini et al. (2011; see Appendix B).

Data Collection and Scoring

Spelling-to-dictation tasks were administered to 42 patients with PPA to assess patients' spelling performance. Twenty-five patients received a 92-item set (73 words and 19 nonwords), 11 patients received a 138-item set (104 words and 34 nonwords), three patients received a 184-item set (146 words and 38 nonwords), two patients received a 168-item set (134 words and 34 nonwords), and one patient received a 62-item set (54 words and eight nonwords; 4,768 words in total, 3,729 words and 1,039 nonwords). See also Appendix C for the words included in the five sets of words and nonwords.

Manual Scoring

For the manual scoring, we followed the schema proposed by Caramazza and Miceli (1990; see also Tainturier & Rapp, 2003), as summarized in Appendix D. Clinicians identify letter errors (i.e., additions, doublings, movements, substitutions, and deletions), and on that basis, they calculate a final score for each word. The outcome of this scoring is a percentage of correct letters for each word, ranging between 0 and 1, where 0 indicates a completely incorrect response and 1 indicates a correct response. The mean score for words was 0.84 (0.26), and for nonwords, it was 0.78 (0.27). To manually score all the data reported here, the clinician, who was moderately experienced, required approximately 120 hr (about 1–2 min per word), but this time can differ, depending on the experience of clinicians. To evaluate reliability across scorers (a PhD researcher, a research coordinator, and a clinician), 100 words and 100 nonwords were selected from different patients, and the Spearman correlations of the scorers were calculated. From these 200 selected items, 90% had incorrect spellings, and 10% had correct spellings. As shown in Table 1, real words exhibited higher interscorer correlations compared to nonwords, underscoring the need for a more reliable nonword scoring system.
Table 1.

Correlation statistics between the three manual raters (N = 100).

X Y Real words
Nonwords
rs p rs p
Rater ARater B.93.0001.77.0001
Rater ARater C.95.0001.78.0001
Rater BRater C.97.0001.88.0001
Correlation statistics between the three manual raters (N = 100).

Automated Scoring

The automated scoring consisted of several steps. Both the targets and the responses were transformed into lowercase, and all leading and following spaces were removed. Once the data were preprocessed, we calculated the distance between the target and the response for every individual item. Different approaches were taken for words and nonwords. The spelling scoring of words was obtained by comparing the spelling of the target word to that of the written response provided by the participant. The spelling scoring of nonwords was estimated by comparing the phonemic transcriptions of the target and the response. To do this, both the target and the response were transcribed into the International Phonetic Alphabet (IPA). To convert words into IPA, we employed eSpeak, which is an open-source software text-to-speech engine for English and other languages, operating on Linux, MacOS, and Windows. The reason we decided to compare the phonetic transcriptions of target and response instead of the spelling directly is that, as indicated in the introduction, nonwords (in English) can potentially have multiple correct orthographic transcriptions yet only one correct phonetic transcription (see example in the Spelling Performance Evaluation: Current Practices section). To simplify string comparison, we removed two symbols from the phonetic transcriptions: the stress symbol /ˈ/ and the length symbol /ː/. For example, using the automated transcription system, a target nonword “feen” is converted to a matching phonemic presentation /fin/ (instead of /fiːn/). The string comparison process was repeated for each of the six distance metrics described in the introduction, namely, (a) the sequence matcher ratio, (b) the Damerau–Levenshtein distance, (c) the normalized Damerau–Levenshtein distance, (d) the Jaccard distance, (e) the Masi distance, and (f) the Jaro–Winkler similarity distance (see Appendix A). The distance values for each of these metrics are values ranging between 0 and 1. For the sequence matcher ratio, a perfect response is equal to 1, while for the normalized Damerau–Levenshtein distance, the Jaccard distance, the Masi distance, and the Jaro–Winkler similarity distance, a perfect response is equal to 0. The only distance metric that does not have values ranging between 0 and 1 is the Damerau–Levenshtein distance, which provides a count of the changes (e.g., deletions, insertions) required to make two strings equal. The algorithm required less than 1 s to calculate spelling scores for the whole database of words and nonwords.

Method Comparison

Once all the distance metrics were calculated for the entire data set, we estimated their correlation to manual scoring. The output of the comparison was a Spearman's rank correlation coefficient, indicating the extent to which the word accuracy scores correlate. All distance metrics were calculated in Python 3. To calculate correlations and their corresponding significance tests (significance tests for correlations and the resulting p values are calculated using t tests from 0), we employed the Python packages, namely, SciPy (Jones et al., 2001), Pingouin (Vallat, 2018), and pandas (McKinney, 2010).

Results

The results of the Spearman correlations between manual and automatic scorings for words and nonwords are shown in Figures 1 and 2 and in Tables 2 and 3, respectively. The first column of the correlation matrix shows the correlations of the manual evaluation with each of the estimated distance metrics. The other columns show the correlations of the automatic distance metrics with one another. For words, the manual scoring and the automated metric scoring provided correlations over .95, which are high correlations. Specifically, there was a high correlation of the normalized Damerau–Levenshtein distance, the Damerau–Levenshtein distance, and the sequence matcher with the manual scoring of spelling productions, r(3538) = .99, p = .001, which indicates that distance metrics provide almost identical results to the manual evaluation. There was a slightly lower correlation of the Jaro–Winkler similarity distance with the manual scores, r(3538) = .96, p = .001. The Jaccard distance and the Masi distance had the lowest correlations with the manual scoring, r(3538) = .95, p = .001.
Figure 1.

Correlation matrix for words. JaccardD = Jaccard distance; JWSD = Jaro–Winkler similarity distance; Manual = manual spelling estimation; MasiD = Masi distance; Norm. RDLD = normalized Damerau–Levenshtein distance; RDLD = Damerau–Levenshtein distance; SM = sequence matcher ratio.

Figure 2.

Correlation matrix for nonwords. JaccardD = Jaccard distance; JWSD = Jaro–Winkler similarity distance; Manual = manual spelling estimation; MasiD = Masi distance; Norm. RDLD = normalized Damerau–Levenshtein distance; RDLD = Damerau–Levenshtein distance; SM = sequence matcher ratio.

Table 2.

Correlation statistics between the manual method and the automatic distance metrics for real words (N = 3,540).

DistancemetricCorrect and incorrect spellings
Incorrect spellings
rs 95% CI rs 95% CI
SM.994*[.99, .99].92*[.91, .92]
RDLD−.990*[−.99, −.99]−.86*[−.87, −.84]
Norm. RDLD .995* [.99,.99] .92* [-.93,.91]
JaccardD−.952*[−.96, −.95]−.86*[−.88, −.85]
MasiD−.950*[−.95, −.95]−.83*[−.84, −.81]

Note. The table provides correlations of the manual approach with the automated distance metrics on all stimuli (Correct and incorrect spellings; N = 3,540) and correlations of the manual approach and the automated distance metrics based only on spellings that were spelled incorrectly (Incorrect spellings; N = 1,327). Shown in the table are the correlation coefficient (r) and the parametric 95% confidence intervals (CIs) of the coefficient, while the asterisk signifies that p < .0001. In bold is the distance metric with the highest score overall. SM = sequence matcher ratio; RDLD = Damerau–Levenshtein distance; Norm. RDLD = normalized Damerau–Levenshtein distance; JaccardD = Jaccard distance; MasiD = Masi distance.

Table 3.

Correlation statistics between the manual method and the automatic distance metrics of nonwords (N = 987).

DistancemetricCorrect and incorrect spellings
Incorrect spellings
rs 95% CI rs 95% CI
SM.945*[.94, .95].799*[.77, .81]
RDLD−.936*[−.94, −.93]−.780*[−.79, −.76]
Norm. RDLD .947* [.95,.94] .821* [.86,.78]
JaccardD−.935*[−.94, −.93]−.786*[−.79, −.76]
MasiD−.933*[−.94, −.92]−.778*[−.78, −.74]

Note. The table provides correlations of the manual approach with the automated distance metrics on all stimuli (Correct and incorrect spellings; N = 987) and correlations of the manual approach and of the automated distance metrics on spellings that were spelled incorrectly (Incorrect spellings; N = 520). Shown in the table are the correlation coefficient (r) and the parametric 95% confidence intervals (CIs) of the coefficient, while the asterisk signifies that p < .0001. In bold is the distance metric with the highest score overall. SM = sequence matcher ratio; RDLD = Damerau–Levenshtein distance; Norm. RDLD = normalized Damerau–Levenshtein distance; JaccardD = Jaccard distance; MasiD = Masi distance.

Correlation matrix for words. JaccardD = Jaccard distance; JWSD = Jaro–Winkler similarity distance; Manual = manual spelling estimation; MasiD = Masi distance; Norm. RDLD = normalized Damerau–Levenshtein distance; RDLD = Damerau–Levenshtein distance; SM = sequence matcher ratio. Correlation matrix for nonwords. JaccardD = Jaccard distance; JWSD = Jaro–Winkler similarity distance; Manual = manual spelling estimation; MasiD = Masi distance; Norm. RDLD = normalized Damerau–Levenshtein distance; RDLD = Damerau–Levenshtein distance; SM = sequence matcher ratio. Correlation statistics between the manual method and the automatic distance metrics for real words (N = 3,540). Note. The table provides correlations of the manual approach with the automated distance metrics on all stimuli (Correct and incorrect spellings; N = 3,540) and correlations of the manual approach and the automated distance metrics based only on spellings that were spelled incorrectly (Incorrect spellings; N = 1,327). Shown in the table are the correlation coefficient (r) and the parametric 95% confidence intervals (CIs) of the coefficient, while the asterisk signifies that p < .0001. In bold is the distance metric with the highest score overall. SM = sequence matcher ratio; RDLD = Damerau–Levenshtein distance; Norm. RDLD = normalized Damerau–Levenshtein distance; JaccardD = Jaccard distance; MasiD = Masi distance. Correlation statistics between the manual method and the automatic distance metrics of nonwords (N = 987). Note. The table provides correlations of the manual approach with the automated distance metrics on all stimuli (Correct and incorrect spellings; N = 987) and correlations of the manual approach and of the automated distance metrics on spellings that were spelled incorrectly (Incorrect spellings; N = 520). Shown in the table are the correlation coefficient (r) and the parametric 95% confidence intervals (CIs) of the coefficient, while the asterisk signifies that p < .0001. In bold is the distance metric with the highest score overall. SM = sequence matcher ratio; RDLD = Damerau–Levenshtein distance; Norm. RDLD = normalized Damerau–Levenshtein distance; JaccardD = Jaccard distance; MasiD = Masi distance. The normalized Damerau–Levenshtein distance outperformed all other distance metrics on nonwords with r(985) = .95, p < .001, followed by the Damerau–Levenshtein distance and sequence matcher ratio that correlated with the manual estimate of spelling with r(985) = .94, p < .001, for both. The Jaccard distance and the Masi distance had correlations of r(985) = .93, p < .001, and finally, the Jaro–Winkler similarity distance had the lowest correlation for nonwords, with a value of r(985) = .91, p < .001. Importantly, for the normalized Damerau–Levenshtein distance, which outperforms the other distance metrics, if we remove the correct cases from the data set (the items on which participants scored 100%), the correlation remains very high with r(3538) = .92 (see Table 2) for words and r(985) = .82 for nonwords (see Table 3).

Discussion

This study aimed to provide a tool for scoring spelling performance automatically by identifying a measure that corresponds closely to the current gold standard for spelling evaluation, the manual letter-by-letter scoring of spelling performance (Caramazza & Miceli, 1990). We compared six distance functions that measure the similarity of strings of letters previously used in other scientific fields for estimating distances of different types of sequences (e.g., sequencing DNA and in computational linguistics). The results showed that all six automated measures had very high correlations with the manual scoring for both words and nonwords. However, the normalized Damerau–Levenshtein distance, which had a .99 correlation with the manual scores for words and .95 correlation with the manual scores for nonwords, outperformed the other distance metrics. An important reason for the high correlation between the normalized Damerau–Levenshtein distance and the manual method is the fact that it considers all four types of errors, namely, deletions, insertions, substitutions, and transpositions, whereas the simple Damerau–Levenshtein distance calculates only deletions, insertions, and substitutions (see Appendix A for more details on the two methods). Therefore, the findings provide good support for using the normalized Damerau–Levenshtein distance as a substitute for the manual process for scoring spelling performance. An important advantage of this method over the manual methods is that it provides objective measures of spelling errors. The present tool will facilitate the evaluation of written language impairments and their treatment in PPA and in other patient populations. A key characteristic of the approach employed for the scoring of words and nonwords is that both types of items are treated as sequences of strings. The evaluation algorithm provides a generic distance that can be employed to score both words and nonwords. The only difference in evaluating spelling performance in words and nonwords is that, as nonwords do not have a standard orthographic representation, rather, they can be transcribed in multiple different ways that are all considered to be correct. As discussed earlier, this is because, in English orthography, different characters and sequences of characters may be used to represent the same sound (e.g., /s/ can be represented as 〈ps〉 in psychology, 〈s〉 in seen, and 〈sc〉 in scene). Also, as shown by the interrater reliability check, for real words, the correlations were high between the three manual scorers, but for nonwords, the correlations were lower. This further highlights the need for a more efficient and consistent way of scoring nonwords. With the innovative inclusion of IPA transcription to represent nonwords phonemically, we have provided a unitary algorithm that can handle both words and nonwords. For nonwords, once the targets and responses were phonemically transcribed, we estimated their distance in the same fashion as we estimated the distance between targets and responses for real words. For example, if the target word is KANTREE, eSpeak will provide the IPA form /kantɹi/. If the patient transcribed the word as KINTRA, the proposed approach will compare /kantɹi/ to /kɪntɹə/. An advantage of using IPA is that it provides consistent phonemic representations of nonwords. This approach contrasts with the challenges faced in manual scoring to estimate the errors for nonwords that can be orthographically represented in multiple ways. In these cases, clinicians have to identify the target representation that the patient probably had in mind so that the scoring is “fair.” For instance, in the manual approach, if the target word is FLOPE and the patient wrote PHLAP, a clinician may not compare the word to the target word FLOPE but to a presumed target PHLOAP, as this is orthographically closer to the response. This would give a score of 5/6. However, a different clinician may compare this response to PHLOPE and provide a different score, specifically, 4/6 (see also the Alternative Approaches Using “Distance Functions” section for discussion). As a result, each scorer's selection of a specific intended target can influence the scoring by enabling a different set of available options for the spelling of the targets. This clearly poses additional challenges for manual scoring of nonword spelling responses. The phonemic transcription of nonwords generates a single pronunciation of a spelled word that was produced using grapheme to pronunciation rules of English, without accessing the lexicon. The pronunciation rules prioritize the most probable pronunciation given the context in which a letter appears (i.e., what letters precede and follow a given letter). What is novel about the proposed approach is that it does not require generating an “intended target.” Instead, the patients' actual response is converted to IPA and compared to the IPA of the target. If the IPA transcription of the target nonword and the transcription of the response match on their pronunciation, the response is considered correct. Since the IPA transcription always provides the same representation for a given item, without having to infer a participant's intended target, the algorithm produces a consistent score, which we consider a valuable benefit of the proposed approach. The pronunciation rules convert an orthographic item to the most probable pronunciation transcription. For example, the nonwords KANTREE, KANTRY, or KANTRI will all be transcribed by the program into /kæntɹi/, and all of these spellings would be scored as correct. However, a less straightforward example is when, for instance, for the stimulus /raInt/, a patient writes RINT. A clinician might choose to score this as correct based on the writing of the existing word PINT /paInt/. On the other hand, the automated algorithm will transcribe RINT into /rInt/ and mark this as an error. However, because such cases are ambiguous, they could also create discrepancies between clinicians. However, because the automated algorithm provides consistent phonemic representation for every item, it eliminates discrepancies that might occur between different clinicians and in human scoring in general, helping to offset the discrepancies between manual and automated scoring. A problem that might arise is the generalizability of the algorithm for patients and clinicians with different English accents, which may lead to different spellings. Some of these cases can be addressed by selecting the corresponding eSpeak pronunciation when using the algorithm. For example, the system already includes pronunciations for Scottish English, Standard British English (received pronunciation), and so on, and further pronunciation dictionaries can be added. However, in future work and especially when this automated method is used in areas of the world with distinct dialects, the target items can be provided in IPA format as well, especially for the nonwords. Lastly, it is important to note that, for the purpose of demonstrating this new methodology, this study used the spelling data from individuals with PPA. However, tests evaluating spelling performance are currently being employed across a broad variety of patient populations, including children with language disorders, as well as adults with stroke-induced aphasia and acquired neurogenic language disorders. Therefore, the proposed method can be employed to estimate spelling performance across a range of different populations, for a variety of different purposes. A limitation of the automated method described here is that it provides item-specific scores, but it does not identify error types, for example, it does not identify semantic and phonologically plausible errors. For instance, if the target is “lion” but the response is TIGER, then this is a semantic substitution error. On the other hand, if the target is “lion” but the response is LAION, then this is a phonologically plausible error. Although this labeling is not provided by the scoring algorithm as currently configured, it can be a useful feature to implement both in clinical and research work (Rapp & Fischer-Baum, 2015). Error type labeling such as this can extend this work even further, adding to the value of this new tool, and it thus constitutes important direction for future research.

Conclusions

The aim of this study has been to provide an objective method for scoring spelling performance that can be used both in clinical and research settings to replace the current manual spelling scoring process, which is both time-consuming and laborious. We obtained spelling scorings using several automatic distance metrics and evaluated their efficiency by calculating their correlations with manual scoring. Our findings showed that the normalized Damerau–Levenshtein distance provides scores very similar to manual scoring for both words and nonwords, with .99 and .95 correlations, respectively, and can thus be employed to automate the scoring of spelling. For words, this distance is estimated by comparing the orthographic representation of the target and the response, while for nonwords, the distance is calculated by comparing the phonemic, IPA-transcribed representation of both the target and response. Finally, it is important to note that, while the manual scoring for a data set of the size discussed here can take many hours to complete, the automated scoring can be completed in less than 1 s. These results provide the basis for developing a useful tool for the clinicians and the researchers to evaluate spelling performance accurately and efficiently.
VariantGenderEducation (years)AgeYears from disease onsetLanguage severity a Overall severity b
nfvPPA = 10lvPPA = 17svPPA = 6mixed = 9 F = 18 M = 24 M = 13.4 (SD = 7) M = 67.1 (SD = 6.9) M = 4.4 (SD = 2.9) M = 1.8 (SD = 0.9) M = 6.6 (SD = 4.9)

Note. nfvPPA = nonfluent variant primary progressive aphasia; lvPPA = logopenic variant primary progressive aphasia; svPPA = semantic variant primary progressive aphasia.

Language severity was measured with the Frontotemporal Dementia Clinical Dementia Rating (FTD-CDR Language subscale; see Knopman et al., 2008; range: 0–3).

Overall severity was measured with the FTD-CDR Language subscale (see Knopman et al., 2008)—rating (FTD-CDR “sum of boxes”; see Knopman et al., 2008; range: 0–24).

184-set
168-set
138-set
92-set
62-set
targettypetargettypetargettypetargettypetargettype
REMMUNnonwordMURNEEnonwordMURNEEnonwordREMMUNnonwordREMMUNnonword
MUSHRAMEnonwordHERMnonwordHERMnonwordMUSHRAMEnonwordMUSHRAMEnonword
SARCLEnonwordDONSEPTnonwordDONSEPTnonwordSARCLEnonwordSARCLEnonword
TEABULLnonwordMERBERnonwordMERBERnonwordTEABULLnonwordTEABULLnonword
HAYGRIDnonwordTROEnonwordTROEnonwordHAYGRIDnonwordHAYGRIDnonword
CHENCHnonwordPYTESnonwordPYTESnonwordCHENCHnonwordCHENCHnonword
MURNEEnonwordFOYSnonwordFOYSnonwordMURNEEnonwordMURNEEnonword
REESHnonwordWESSELnonwordWESSELnonwordREESHnonwordREESHnonword
BOKEnonwordFEENnonwordFEENnonwordBOKEnonwordBOKEnonword
HERMnonwordSNOYnonwordSNOYnonwordHERMnonwordHERMnonword
HANNEEnonwordPHLOKEnonwordPHLOKEnonwordHANNEEnonwordHANNEEnonword
DEWTnonwordDEWTnonwordDEWTnonwordDEWTnonwordDEWTnonword
KWINEnonwordGHURBnonwordGHURBnonwordKWINEnonwordKWINEnonword
DONSEPTnonwordPHOITnonwordPHOITnonwordDONSEPTnonwordDONSEPTnonword
PHOITnonwordHAYGRIDnonwordHAYGRIDnonwordPHOITnonwordPHOITnonword
KANTREEnonwordKROIDnonwordKROIDnonwordKANTREEnonwordKANTREEnonword
LORNnonwordKITTULnonwordKITTULnonwordLORNnonwordLORNnonword
FEENnonwordBRUTHnonwordBRUTHnonwordFEENnonwordFEENnonword
SKARTnonwordKANTREEnonwordKANTREEnonwordSKARTnonwordSKARTnonword
REMMUNnonwordBERKnonwordBERKnonwordSHOOTwordENOUGHword
MUSHRAMEnonwordWUNDOEnonwordWUNDOEnonwordHANGwordFRESHword
SARCLEnonwordSARCLEnonwordSARCLEnonwordPRAYwordQUAINTword
TEABULLnonwordSORTAINnonwordSORTAINnonwordSHAVEwordFABRICword
HAYGRIDnonwordLORNnonwordLORNnonwordKICKwordCRISPword
CHENCHnonwordBOKEnonwordBOKEnonwordCUTwordPURSUITword
MURNEEnonwordTEABULLnonwordTEABULLnonwordSPEAKwordSTREETword
REESHnonwordHANNEEnonwordHANNEEnonwordROPEwordPRIESTword
BOKEnonwordKWINEnonwordKWINEnonwordDEERwordSUSPENDword
HERMnonwordREMMUNnonwordREMMUNnonwordCANEwordBRISKword
HANNEEnonwordSUMEnonwordSUMEnonwordLEAFwordSPOILword
DEWTnonwordSKARTnonwordSKARTnonwordDOORwordRIGIDword
KWINEnonwordREESHnonwordREESHnonwordROADwordRATHERword
DONSEPTnonwordMUSHRAMEnonwordMUSHRAMEnonwordBOOKwordMOMENTword
PHOITnonwordCHENCHnonwordCHENCHnonwordBALLwordHUNGRYword
KANTREEnonwordSHOOTwordDECIDEwordSEWwordPIERCEword
LORNnonwordHANGwordGRIEFwordKNOCKwordLISTENword
FEENnonwordPRAYwordSPENDwordSIEVEwordGLOVEword
SKARTnonwordSHAVEwordLOYALwordSEIZEwordSINCEword
SHOOTwordKICKwordRATHERwordGAUGEwordBRINGword
HANGwordCUTwordPREACHwordSIGHwordARGUEword
PRAYwordSPEAKwordFRESHwordLAUGHwordBRIGHTword
SHAVEwordROPEwordCONQUERwordCHOIRwordSEVEREword
KICKwordDEERwordSHALLwordAISLEwordCARRYword
CUTwordCANEwordSOUGHTwordLIMBwordAFRAIDword
SPEAKwordLEAFwordTHREATwordHEIRwordWHATword
ROPEwordDOORwordABSENTwordTONGUEwordSTARVEword
DEERwordROADwordSPEAKwordSWORDwordMEMBERword
CANEwordBOOKwordENOUGHwordGHOSTwordTALENTword
LEAFwordBALLwordCAREERwordEARTHwordUNDERword
DOORwordSEWwordCHURCHwordENOUGHwordLENGTHword
ROADwordKNOCKwordBRIGHTwordFRESHwordBORROWword
BOOKwordSIEVEwordADOPTwordQUAINTwordSPEAKword
BALLwordSEIZEwordSTRICTwordFABRICwordFAITHword
SEWwordGAUGEwordLEARNwordCRISPwordSTRICTword
KNOCKwordSIGHwordQUAINTwordPURSUITwordHAPPYword
SIEVEwordLAUGHwordBECOMEwordSTREETwordGRIEFword
SEIZEwordCHOIRwordSTRONGwordPRIESTwordABSENTword
GAUGEwordAISLEwordBUGLEwordSUSPENDwordPOEMword
SIGHwordLIMBwordSTARVEwordBRISKwordTHOUGHword
LAUGHwordHEIRwordDENYwordSPOILwordGREETword
CHOIRwordTONGUEwordLENGTHwordRIGIDwordPROVIDEword
AISLEwordSWORDwordPILLOWwordRATHERwordWINDOWword
LIMBwordGHOSTwordCAUGHTwordMOMENTword
HEIRwordEARTHwordBEFOREwordHUNGRYword
TONGUEwordDECIDEwordAFRAIDwordPIERCEword
SWORDwordGRIEFwordBODYwordLISTENword
GHOSTwordSPENDwordHUNGRYwordGLOVEword
EARTHwordLOYALwordCOLUMNwordSINCEword
ENOUGHwordRATHERwordTHESEwordBRINGword
FRESHwordPREACHwordSTRIPEwordARGUEword
QUAINTwordFRESHwordMUSICwordBRIGHTword
FABRICwordCONQUERwordCRISPwordSEVEREword
CRISPwordSHALLwordHURRYwordCARRYword
PURSUITwordSOUGHTwordPIERCEwordAFRAIDword
STREETwordTHREATwordTINYwordWHATword
PRIESTwordABSENTwordARGUEwordSTARVEword
SUSPENDwordSPEAKwordDIGITwordMEMBERword
BRISKwordENOUGHwordCOMMONwordTALENTword
SPOILwordCAREERwordANNOYwordUNDERword
RIGIDwordCHURCHwordSTREETwordLENGTHword
RATHERwordBRIGHTwordCOULDwordBORROWword
MOMENTwordADOPTwordRIGIDwordSPEAKword
HUNGRYwordSTRICTwordOFTENwordFAITHword
PIERCEwordLEARNwordBRINGwordSTRICTword
LISTENwordQUAINTwordLISTENwordHAPPYword
GLOVEwordBECOMEwordFIERCEwordGRIEFword
SINCEwordSTRONGwordNATUREwordABSENTword
BRINGwordBUGLEwordUNDERwordPOEMword
ARGUEwordSTARVEwordMOTELwordTHOUGHword
BRIGHTwordDENYwordSHOULDwordGREETword
SEVEREwordLENGTHwordVULGARwordPROVIDEword
CARRYwordPILLOWwordCHEAPwordWINDOWword
AFRAIDwordCAUGHTwordSPOILword
WHATwordBEFOREwordCERTAINword
STARVEwordAFRAIDwordABOVEword
MEMBERwordBODYwordLOUDword
TALENTwordHUNGRYwordSINCEword
UNDERwordCOLUMNwordSLEEKword
LENGTHwordTHESEwordREVEALword
BORROWwordSTRIPEwordBOTHword
SPEAKwordMUSICwordNOISEword
FAITHwordCRISPwordINTOword
STRICTwordHURRYwordSTRANGEword
HAPPYwordPIERCEwordCARRYword
GRIEFwordTINYwordSOLVEword
ABSENTwordARGUEwordOCEANword
POEMwordDIGITwordDECENTword
THOUGHwordCOMMONwordTHOUGHword
GREETwordANNOYwordSHORTword
PROVIDEwordSTREETwordABOUTword
WINDOWwordCOULDwordBOTTOMword
SHOOTwordRIGIDwordFRIENDword
HANGwordOFTENwordLOBSTERword
PRAYwordBRINGwordVIVIDword
SHAVEwordLISTENwordBROADword
KICKwordFIERCEwordWHILEword
CUTwordNATUREwordCHILDword
SPEAKwordUNDERwordHAPPENword
ROPEwordMOTELwordSEVEREword
DEERwordSHOULDwordAFTERword
CANEwordVULGARwordPRIESTword
LEAFwordCHEAPwordMERGEword
DOORwordSPOILwordGLOVEword
ROADwordCERTAINwordONLYword
BOOKwordABOVEwordGREETword
BALLwordLOUDwordMEMBERword
SEWwordSINCEwordBEGINword
KNOCKwordSLEEKwordBRISKword
SIEVEwordREVEALwordWHATword
SEIZEwordBOTHwordBORROWword
GAUGEwordNOISEwordJURYword
SIGHwordINTOwordSLEEVEword
LAUGHwordSTRANGEwordBOUGHTword
CHOIRwordCARRYwordHAPPYword
AISLEwordSOLVEwordSPACEword
LIMBwordOCEANwordANGRYword
HEIRwordDECENTwordTHOSEword
TONGUEwordTHOUGHwordFAITHword
SWORDwordSHORTword
GHOSTwordABOUTword
EARTHwordBOTTOMword
ENOUGHwordFRIENDword
FRESHwordLOBSTERword
QUAINTwordVIVIDword
FABRICwordBROADword
CRISPwordWHILEword
PURSUITwordCHILDword
STREETwordHAPPENword
PRIESTwordSEVEREword
SUSPENDwordAFTERword
BRISKwordPRIESTword
SPOILwordMERGEword
RIGIDwordGLOVEword
RATHERwordONLYword
MOMENTwordGREETword
HUNGRYwordMEMBERword
PIERCEwordBEGINword
LISTENwordBRISKword
GLOVEwordWHATword
SINCEwordBORROWword
BRINGwordJURYword
ARGUEwordSLEEVEword
BRIGHTwordBOUGHTword
SEVEREwordHAPPYword
CARRYwordSPACEword
AFRAIDwordANGRYword
WHATwordTHOSEword
STARVEwordFAITHword
MEMBERword
TALENTword
UNDERword
LENGTHword
BORROWword
SPEAKword
FAITHword
STRICTword
HAPPYword
GRIEFword
ABSENTword
POEMword
THOUGHword
GREETword
PROVIDEword
WINDOWword

Note. Set 1: 184 items, Set 2: 168 items, Set 3: 138 items, Set 4: 92 items, and Set 5: 62 items.

  14 in total

1.  Normal and impaired spelling in a connectionist dual-route architecture.

Authors:  George Houghton; Marco Zorzi
Journal:  Cogn Neuropsychol       Date:  2003-03-01       Impact factor: 2.468

2.  The structure of graphemic representations.

Authors:  A Caramazza; G Miceli
Journal:  Cognition       Date:  1990-12

3.  Classification of primary progressive aphasia and its variants.

Authors:  M L Gorno-Tempini; A E Hillis; S Weintraub; A Kertesz; M Mendez; S F Cappa; J M Ogar; J D Rohrer; S Black; B F Boeve; F Manes; N F Dronkers; R Vandenberghe; K Rascovsky; K Patterson; B L Miller; D S Knopman; J R Hodges; M M Mesulam; M Grossman
Journal:  Neurology       Date:  2011-02-16       Impact factor: 9.910

Review 4.  Modeling reading, spelling, and past tense learning with artificial neural networks.

Authors:  J A Bullinaria
Journal:  Brain Lang       Date:  1997-09       Impact factor: 2.381

5.  Automating Error Frequency Analysis via the Phonemic Edit Distance Ratio.

Authors:  Michael Smith; Kevin T Cunningham; Katarina L Haley
Journal:  J Speech Lang Hear Res       Date:  2019-06-06       Impact factor: 2.297

6.  Non-word repetition in adolescents with specific language impairment and autism plus language impairments: a qualitative analysis.

Authors:  N G Riches; T Loucas; G Baird; T Charman; E Simonoff
Journal:  J Commun Disord       Date:  2010-07-08       Impact factor: 2.288

7.  A treatment sequence for phonological alexia/agraphia.

Authors:  Pélagie M Beeson; Kindle Rising; Esther S Kim; Steven Z Rapcsak
Journal:  J Speech Lang Hear Res       Date:  2010-04       Impact factor: 2.297

8.  Patterns of breakdown in spelling in primary progressive aphasia.

Authors:  Kathryn Sepelyak; Jennifer Crinion; John Molitoris; Zachary Epstein-Peterson; Maralyssa Bann; Cameron Davis; Melissa Newhart; Jennifer Heidler-Gary; Kyrana Tsapkini; Argye E Hillis
Journal:  Cortex       Date:  2009-12-22       Impact factor: 4.027

9.  Spelling intervention in post-stroke aphasia and primary progressive aphasia.

Authors:  Kyrana Tsapkini; Argye E Hillis
Journal:  Behav Neurol       Date:  2013       Impact factor: 3.342

10.  The use of spelling for variant classification in primary progressive aphasia: Theoretical and practical implications.

Authors:  Kyriaki Neophytou; Robert W Wiley; Brenda Rapp; Kyrana Tsapkini
Journal:  Neuropsychologia       Date:  2019-08-08       Impact factor: 3.139

View more
  1 in total

1.  Selective Functional Network Changes Following tDCS-Augmented Language Treatment in Primary Progressive Aphasia.

Authors:  Yuan Tao; Bronte Ficek; Zeyi Wang; Brenda Rapp; Kyrana Tsapkini
Journal:  Front Aging Neurosci       Date:  2021-07-12       Impact factor: 5.750

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.