Literature DB >> 22797049

The impact of OCR accuracy on automated cancer classification of pathology reports.

Guido Zuccon1, Anthony N Nguyen, Anton Bergheim, Sandra Wickman, Narelle Grayson.   

Abstract

OBJECTIVE: To evaluate the effects of Optical Character Recognition (OCR) on the automatic cancer classification of pathology reports.
METHOD: Scanned images of pathology reports were converted to electronic free-text using a commercial OCR system. A state-of-the-art cancer classification system, the Medical Text Extraction (MEDTEX) system, was used to automatically classify the OCR reports. Classifications produced by MEDTEX on the OCR versions of the reports were compared with the classification from a human amended version of the OCR reports.
RESULTS: The employed OCR system was found to recognise scanned pathology reports with up to 99.12% character accuracy and up to 98.95% word accuracy. Errors in the OCR processing were found to minimally impact on the automatic classification of scanned pathology reports into notifiable groups. However, the impact of OCR errors is not negligible when considering the extraction of cancer notification items, such as primary site, histological type, etc.
CONCLUSIONS: The automatic cancer classification system used in this work, MEDTEX, has proven to be robust to errors produced by the acquisition of freetext pathology reports from scanned images through OCR software. However, issues emerge when considering the extraction of cancer notification items.

Entities:  

Mesh:

Year:  2012        PMID: 22797049

Source DB:  PubMed          Journal:  Stud Health Technol Inform        ISSN: 0926-9630


  3 in total

1.  Classification of cancer-related death certificates using machine learning.

Authors:  Luke Butt; Guido Zuccon; Anthony Nguyen; Anton Bergheim; Narelle Grayson
Journal:  Australas Med J       Date:  2013-05-30

2.  Salience of Medical Concepts of Inside Clinical Texts and Outside Medical Records for Referred Cardiovascular Patients.

Authors:  Sungrim Moon; Sijia Liu; David Chen; Yanshan Wang; Douglas L Wood; Rajeev Chaudhry; Hongfang Liu; Paul Kingsbury
Journal:  J Healthc Inform Res       Date:  2019-01-28

3.  Generating high-quality data abstractions from scanned clinical records: text-mining-assisted extraction of endometrial carcinoma pathology features as proof of principle.

Authors:  Anthony Nguyen; John O'Dwyer; Thanh Vu; Penelope M Webb; Sharon E Johnatty; Amanda B Spurdle
Journal:  BMJ Open       Date:  2020-06-11       Impact factor: 2.692

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.