| Literature DB >> 30658682 |
T Elizabeth Workman1, Yijun Shao2, Guy Divita3, Qing Zeng-Treitler2.
Abstract
OBJECTIVE: Misspellings in clinical free text present challenges to natural language processing. With an objective to identify misspellings and their corrections, we developed a prototype spelling analysis method that implements Word2Vec, Levenshtein edit distance constraints, a lexical resource, and corpus term frequencies. We used the prototype method to process two different corpora, surgical pathology reports, and emergency department progress and visit notes, extracted from Veterans Health Administration resources. We evaluated performance by measuring positive predictive value and performing an error analysis of false positive output, using four classifications. We also performed an analysis of spelling errors in each corpus, using common error classifications.Entities:
Keywords: Clinical text; Spelling analysis; Spelling correction; Word embeddings; Word2Vec
Mesh:
Year: 2019 PMID: 30658682 PMCID: PMC6339425 DOI: 10.1186/s13104-019-4073-y
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Fig. 1Method pipeline
False positives’ types and frequencies
| Error type | Surgical pathology notes | Emergency visit and progress notes |
|---|---|---|
| Different word, spelled correctly | 3 (60%) | 9 (37.5%) |
| Misspelling of different word | 0 (0%) | 12 (50%) |
| Alternative form of noisy term | 2 (40%) | 0 (0%) |
| Slang equivalent | 0 (0%) | 3 (12.5%) |
| Totals | 5 (100%) | 24 (100%) |
Spelling error types by corpus and frequency
| Misspelling type | Surgical pathology notes | Emergency visit and progress notes |
|---|---|---|
| Insertion | 10 (20.8%) | 32 (15.2%) |
| Omission | 25 (52.1%) | 103 (48.8%) |
| Transposition | 8 (16.7%) | 40 (19%) |
| Wrong letter | 2 (04.1%) | 27 (12.8%) |
| Multiple/mixed | 3 (06.3%) | 9 (04.2%) |
| Totals | 48 (100%) | 211 (100%) |