BACKGROUND: Errors in grammar, spelling, and usage in radiology reports are common. To automatically detect inappropriate insertions, deletions, and substitutions of words in radiology reports, we proposed using a neural sequence-to-sequence (seq2seq) model. METHODS: Head CT and chest radiograph reports from Mount Sinai Hospital (MSH) (n=61,722 and 818,978, respectively), Mount Sinai Queens (MSQ) (n=30,145 and 194,309, respectively) and MIMIC-III (n=32,259 and 54,685) were converted into sentences. Insertions, substitutions, and deletions of words were randomly introduced. Seq2seq models were trained using corrupted sentences as input to predict original uncorrupted sentences. Three models were trained using head CTs from MSH, chest radiographs from MSH, and head CTs from all three collections. Model performance was assessed across different sites and modalities. A sample of original, uncorrupted sentences were manually reviewed for any error in syntax, usage, or spelling to estimate real-world proofreading performance of the algorithm. RESULTS: Seq2seq detected 90.3% and 88.2% of corrupted sentences with 97.7% and 98.8% specificity in same-site, same-modality test sets for head CTs and chest radiographs, respectively. Manual review of original, uncorrupted same-site same-modality head CT sentences demonstrated seq2seq positive predictive value (PPV) 0.393 (157/400; 95% CI, 0.346-0.441) and negative predictive value (NPV) 0.986 (789/800; 95% CI, 0.976-0.992) for detecting sentences containing real-world errors, with estimated sensitivity of 0.389 (95% CI, 0.267-0.542) and specificity 0.986 (95% CI, 0.985-0.987) over n=86,211 uncorrupted training examples. CONCLUSIONS: Seq2seq models can be highly effective at detecting erroneous insertions, deletions, and substitutions of words in radiology reports. To achieve high performance, these models require site- and modality-specific training examples. Incorporating additional targeted training data could further improve performance in detecting real-world errors in reports.
BACKGROUND: Errors in grammar, spelling, and usage in radiology reports are common. To automatically detect inappropriate insertions, deletions, and substitutions of words in radiology reports, we proposed using a neural sequence-to-sequence (seq2seq) model. METHODS: Head CT and chest radiograph reports from Mount Sinai Hospital (MSH) (n=61,722 and 818,978, respectively), Mount Sinai Queens (MSQ) (n=30,145 and 194,309, respectively) and MIMIC-III (n=32,259 and 54,685) were converted into sentences. Insertions, substitutions, and deletions of words were randomly introduced. Seq2seq models were trained using corrupted sentences as input to predict original uncorrupted sentences. Three models were trained using head CTs from MSH, chest radiographs from MSH, and head CTs from all three collections. Model performance was assessed across different sites and modalities. A sample of original, uncorrupted sentences were manually reviewed for any error in syntax, usage, or spelling to estimate real-world proofreading performance of the algorithm. RESULTS: Seq2seq detected 90.3% and 88.2% of corrupted sentences with 97.7% and 98.8% specificity in same-site, same-modality test sets for head CTs and chest radiographs, respectively. Manual review of original, uncorrupted same-site same-modality head CT sentences demonstrated seq2seq positive predictive value (PPV) 0.393 (157/400; 95% CI, 0.346-0.441) and negative predictive value (NPV) 0.986 (789/800; 95% CI, 0.976-0.992) for detecting sentences containing real-world errors, with estimated sensitivity of 0.389 (95% CI, 0.267-0.542) and specificity 0.986 (95% CI, 0.985-0.987) over n=86,211 uncorrupted training examples. CONCLUSIONS: Seq2seq models can be highly effective at detecting erroneous insertions, deletions, and substitutions of words in radiology reports. To achieve high performance, these models require site- and modality-specific training examples. Incorporating additional targeted training data could further improve performance in detecting real-world errors in reports.
Authors: John A Pezzullo; Glenn A Tung; Jeffrey M Rogg; Lawrence M Davis; Jeffrey M Brody; William W Mayo-Smith Journal: J Digit Imaging Date: 2008-12 Impact factor: 4.056
Authors: Dennis R Williams; Sheila K Kori; Brenda Williams; Sandra J Sackrison; Henryk M Kowalski; Michael G McLaughlin; Brian S Kuszyk Journal: AJR Am J Roentgenol Date: 2013-11 Impact factor: 3.959
Authors: Gunvant R Chaudhari; Tengxiao Liu; Timothy L Chen; Gabby B Joseph; Maya Vella; Yoo Jin Lee; Thienkhai H Vu; Youngho Seo; Andreas M Rauschecker; Charles E McCulloch; Jae Ho Sohn Journal: Radiol Artif Intell Date: 2022-05-25
Authors: Arlene Casey; Emma Davidson; Michael Poon; Hang Dong; Daniel Duma; Andreas Grivas; Claire Grover; Víctor Suárez-Paniagua; Richard Tobin; William Whiteley; Honghan Wu; Beatrice Alex Journal: BMC Med Inform Decis Mak Date: 2021-06-03 Impact factor: 2.796