| Literature DB >> 30856387 |
Kangqi Luo1, Jinyi Lu1, Kenny Q Zhu2, Weiguo Gao3, Jia Wei4, Meizhuo Zhang3.
Abstract
Textual information embedded in the medical image contains rich structured information about the medical condition of a patient. This paper aims at extracting structured textual information from semi-structured medical images. Given the recognized text spans of an image preprocessed by optical character recognition (OCR), due to the spatial discontinuity of texts spans as well as potential errors brought by OCR, the structured information extraction becomes more challenging. In this paper, we propose a domain-specific language, called ODL, which allows users to describe the value and layout of text data contained in the images. Based on the value and spatial constraints described in ODL, the ODL parser associates values found in the image with the data structure in the ODL description, while conforming to the aforementioned constraints. We conduct experiments on a dataset consisting of real medical images, our ODL parser consistently outperforms existing approaches in terms of extraction accuracy, which shows the better tolerance of incorrectly recognized texts, and positional variances between images. This accuracy can be further improved by learning from a few manual corrections.Entities:
Keywords: Domain-specific language; Electronic medical records; Information extraction; Medical images; Optical character recognition; Spatial layout
Mesh:
Year: 2019 PMID: 30856387 DOI: 10.1016/j.compbiomed.2019.02.016
Source DB: PubMed Journal: Comput Biol Med ISSN: 0010-4825 Impact factor: 4.589