| Literature DB >> 22934236 |
Julliette M Buckley1, Suzanne B Coopey, John Sharko, Fernanda Polubriaginof, Brian Drohan, Ahmet K Belli, Elizabeth M H Kim, Judy E Garber, Barbara L Smith, Michele A Gadd, Michelle C Specht, Constance A Roche, Thomas M Gudewicz, Kevin S Hughes.
Abstract
OBJECTIVE: The opportunity to integrate clinical decision support systems into clinical practice is limited due to the lack of structured, machine readable data in the current format of the electronic health record. Natural language processing has been designed to convert free text into machine readable data. The aim of the current study was to ascertain the feasibility of using natural language processing to extract clinical information from >76,000 breast pathology reports. APPROACH AND PROCEDURE: Breast pathology reports from three institutions were analyzed using natural language processing software (Clearforest, Waltham, MA) to extract information on a variety of pathologic diagnoses of interest. Data tables were created from the extracted information according to date of surgery, side of surgery, and medical record number. The variety of ways in which each diagnosis could be represented was recorded, as a means of demonstrating the complexity of machine interpretation of free text.Entities:
Keywords: Breast pathology reports; clinical decision support; natural language processing
Year: 2012 PMID: 22934236 PMCID: PMC3424662 DOI: 10.4103/2153-3539.97788
Source DB: PubMed Journal: J Pathol Inform
Figure 1Sample pathology report showing the fields extracted (highlighted in bold type). Each specimen was parsed separately and generated its own “final diagnosis”
Figure 2Sample datasheet displaying extracted diagnostic information from the sample report shown in Figure 1. As each specimen generated its own “final diagnosis,” a single row was created for each specimen by MRN, date, side and specimen in the first of three databases created
The number of ways in which each diagnosis was said in pathology reports
Different ways in which pathologists describe the presence of atypical ductal hyperplasia
Some examples of the 95 ways of saying “invasive lobular carcinoma”
Figure 3Sample datasheet showing examples of missed diagnoses by the software. In row 1, “atypical hyperplasia” was not associated with either “ductal” or “lobular” and thus was not a pattern recognized by the software. In rows 2 and 3, the way in which “atypical ductal hyperplasia” was written was not a pattern recognized by the software. In row 3, typographical errors in the spelling of “carcinoma” meant the presence of DCIS was not detected by the processor