Literature DB >> 22934236

The feasibility of using natural language processing to extract clinical information from breast pathology reports.

Julliette M Buckley¹, Suzanne B Coopey, John Sharko, Fernanda Polubriaginof, Brian Drohan, Ahmet K Belli, Elizabeth M H Kim, Judy E Garber, Barbara L Smith, Michele A Gadd, Michelle C Specht, Constance A Roche, Thomas M Gudewicz, Kevin S Hughes.

Abstract

OBJECTIVE: The opportunity to integrate clinical decision support systems into clinical practice is limited due to the lack of structured, machine readable data in the current format of the electronic health record. Natural language processing has been designed to convert free text into machine readable data. The aim of the current study was to ascertain the feasibility of using natural language processing to extract clinical information from >76,000 breast pathology reports. APPROACH AND PROCEDURE: Breast pathology reports from three institutions were analyzed using natural language processing software (Clearforest, Waltham, MA) to extract information on a variety of pathologic diagnoses of interest. Data tables were created from the extracted information according to date of surgery, side of surgery, and medical record number. The variety of ways in which each diagnosis could be represented was recorded, as a means of demonstrating the complexity of machine interpretation of free text.
RESULTS: There was widespread variation in how pathologists reported common pathologic diagnoses. We report, for example, 124 ways of saying invasive ductal carcinoma and 95 ways of saying invasive lobular carcinoma. There were >4000 ways of saying invasive ductal carcinoma was not present. Natural language processor sensitivity and specificity were 99.1% and 96.5% when compared to expert human coders.
CONCLUSION: We have demonstrated how a large body of free text medical information such as seen in breast pathology reports, can be converted to a machine readable format using natural language processing, and described the inherent complexities of the task.

Entities: Chemical

Keywords: Breast pathology reports; clinical decision support; natural language processing

Year: 2012 PMID： 22934236 PMCID： PMC3424662 DOI： 10.4103/2153-3539.97788

Source DB: PubMed Journal: J Pathol Inform

BACKGROUND AND SIGNIFICANCE

The promise that the Electronic Health Record (EHR) will increase quality while decreasing cost is largely dependent on widespread integration of computerized Clinical Decision Support (CDS). CDS systems apply algorithms and guidelines to the patient data to help determine the diagnosis and/or the best course of action, and then present that result to the clinician and the patient in a visualization that makes it easy to understand and that stimulates action.[1] The caveat is that CDS systems require data that are both structured and machine readable. As the vast majority of data in the EHR are free text there is currently little opportunity to institute CDS systems into clinical practice. The simplest, but most time consuming approach to unlocking the data in free text is to have an expert read and interpret each report. While this approach works relatively well in the day-to-day care of individual patients, it is impractical when attempting real-time CDS on all patients seen at an institution or when undertaking a retrospective review of tens of thousands of cases. To consider one such situation, pathology reports contain tremendously valuable data regarding the clinical situation of the patient. These reports are almost always written in a free text format. While synoptic reporting in some anatomic pathology systems have made an effort in the right direction to provide discreet data elements, there are still comment/note sections that allow result verbiage with free text. Natural language processing (NLP) software has been designed to convert free text into machine readable, structured data. While NLP has been touted as a solution to the problem, this approach is not nearly as simple or effective as it may sound. The inherent linguistic and structural variability within any body of free text poses a significant challenge to efficient retrieval of data.

OBJECTIVE

As a proof of principle of the utility, but also of the difficulty, of using NLP to decipher breast pathology reports, we undertook the creation of a database of results from breast pathology reports at the Massachusetts General Hospital (MGH), The Brigham and Women's Hospital (BWH) and Newton-Wellesley Hospital (NWH). Our goal was to identify which specimens had evidence of any, or all of a number of diagnoses of interest.

APPROACH AND PROCEDURE

With the approval of the Partners Institutional Review Board (IRB), all electronically available pathology reports from MGH, BWH, and NWH between 1987 and 2010 that involved breast tissue were identified from the Research Data Repository which holds pathology report data from all institutions. International Classification of Diseases -9 (ICD-9) and Current Procedural Terminology (CPT) codes were used to identify those reports pertaining to breast. We determined that the most important diagnoses for our study that might be found, either alone or in combination, within a pathology report were invasive ductal cancer (IDC); invasive lobular cancer (ILC); invasive cancer NOS; ductal carcinoma in situ (DCIS); severe atypical ductal hyperplasia (severe ADH); lobular carcinoma in situ (LCIS); atypical lobular hyperplasia (ALH); atypical ductal hyperplasia (ADH); and benign. As a preparatory step, a folder or “bucket” was then created for each diagnosis within the NLP software (Clearforest, Waltham, MA). A “bucket” would eventually hold a set of words and/or phrases that denoted a specific pathology report. Next, the layout of the pathology report was analyzed. The most important information pertaining to the diagnosis was contained in a section labeled “Final Diagnosis,” which was present for each distinct specimen (a report might have an excision and four shaved margins as five distinct specimens for the same side with the potential for different diagnoses by specimen) [Figure 1]. Thus there was often more than one final diagnosis in a single pathology report on a single day. We identified both the start and the end of the final diagnosis section for each specimen and these sections were parsed out and associated with a Medical Record Number (MRN), Date, and Side [Figures 1 and 2]. Parsing techniques varied by institution, due to the unique, institution specific report layouts. Using NLP software (Clearforest, Waltham, MA), the “Final Diagnosis” section of a test set of 500 reports from each institution was processed. The NLP software displayed all words and phrases in these reports, and the number of times each was used in this set of reports, and provided an interface to associate each word or phrase with one or more of the “buckets.” Each entity generated by the software was then associated with the “bucket” it represented. For example, the entities “infiltrating ductal carcinoma,” “invasive cancer with ductal features,” “invasive cancer, ductal type,” etc. all went into the “invasive ductal cancer” bucket. Some entities went into more than one bucket, such as “invasive carcinoma with both ductal and lobular features,” which was both IDC and ILC. This approach was then applied to the larger data set, to test its functionality and to identify words or phrases missed in the test set.

Figure 1

Sample pathology report showing the fields extracted (highlighted in bold type). Each specimen was parsed separately and generated its own “final diagnosis”

Figure 2

Sample datasheet displaying extracted diagnostic information from the sample report shown in Figure 1. As each specimen generated its own “final diagnosis,” a single row was created for each specimen by MRN, date, side and specimen in the first of three databases created

Sample pathology report showing the fields extracted (highlighted in bold type). Each specimen was parsed separately and generated its own “final diagnosis” Sample datasheet displaying extracted diagnostic information from the sample report shown in Figure 1. As each specimen generated its own “final diagnosis,” a single row was created for each specimen by MRN, date, side and specimen in the first of three databases created It was also identified that an entity may be negated, and the negation might lie either before or after the text. For example, a report may state that there was “no evidence of invasive carcinoma,” or “residual DCIS was not seen.” All words and phrases that denoted negation and their order in the sentence (before or after the diagnostic entity) were identified and placed in pre- and postnegation categories. A pattern was then created to recognize negation. If an entity was negated, it was not recorded in the final data set for that record. The multiple ways of saying each entity were counted as well as the multiple ways of stating negation. A single row in an Access (Microsoft) table was created for each specimen, where the presence of each entity in that specimen was recorded in the appropriate column. This initial table had a row for each MRN, date, side, and specimen and denoted all diagnoses present in that specimen on that date [Figure 2]. Each of these “final diagnoses” from a single date and side were amalgamated into a single row in a second table that denoted an MRN, date, side, and all diagnoses on that date. We then identified a “maximum diagnosis” on each date by establishing a trumping order (an “order of significance”), such that IDC, ILC, or invasive cancer NOS, would outweigh DCIS, which would outweigh severe ADH, which would outweigh LCIS, which would outweigh ALH, which would outweigh ADH, which would outweigh benign. Where multiple surgeries occurred in the course of treating the same problem on a given side, such as re-excisions for positive margins, we considered these a single “episode of care” for that patient; thus all pathology results from a single side within a 6-month time frame were amalgamated into a third Table organized by MRN, Date, Side, and Episode. Pathology reports outside this 6-month period or from the opposite side were considered as separate episodes. The most significant or “maximum” diagnosis was taken as the primary diagnosis, with the others listed as secondary diagnoses in each Table. Thus, NLP created three data Tables, MRN, Date, Side, Specimen, which separately listed all diagnoses from each specimen on a given day, “MRN, Date, Side, Summary Diagnoses” which summarized all diagnoses from a given day and an “MRN, Side, Episode of Care, Diagnoses” Table which summarized all diagnoses from a given episode. As our first study was conducted to identify patients with high risk lesions, we opted to review a nonrandom sample of 6,711 pathology reports which were identified in patients who had a diagnosis of severe ADH, LCIS, ALH or ADH, without prior or concurrent cancer. These NLP results were reviewed by human coders who compared the result to the free text report to determine the accuracy of the NLP. The accuracy for the maximum diagnosis and the accuracy for all diagnoses were recorded separately.

RESULTS

In 76,333 breast pathology reports, multiple entities were identified that represented each of the significant buckets. Excluding typographical errors and spacing errors, we identified 124 ways of saying invasive ductal cancer; 95 ways of saying invasive lobular cancer; 52 ways of saying DCIS; 14 ways of saying severe ADH; 53 ways of saying lobular carcinoma in situ; 17 ways of saying atypical lobular hyperplasia and 14 ways of saying atypical ductal hyperplasia [Table 1]. Examples of ways to describe ADH and invasive carcinoma are shown in Tables 2 and 3.

Table 1

The number of ways in which each diagnosis was said in pathology reports

Table 2

Different ways in which pathologists describe the presence of atypical ductal hyperplasia

Table 3

Some examples of the 95 ways of saying “invasive lobular carcinoma”

The number of ways in which each diagnosis was said in pathology reports Different ways in which pathologists describe the presence of atypical ductal hyperplasia Some examples of the 95 ways of saying “invasive lobular carcinoma” In addition, we identified 21 ways of negating a diagnosis when the words appeared before the diagnosis (e.g., No evidence of invasive ductal carcinoma), and an additional 12 ways of negating the diagnosis when the words fell after the diagnosis (e.g., ADH was not seen). As each entity can potentially be negated by a pre- or postnegative one, must multiply the number of ways of stating the negation by the number of ways of describing that particular diagnostic entity. For example, with invasive ductal cancer; that means 124 ways of saying IDC multiplied by 33 ways of saying “not” gives a total of 4092 potential ways to say IDC was not present. When the processor output was compared to reports as reviewed by expert human coders, 97% of reports were correct for all diagnoses and 97.8% were correct for the maximum diagnosis. Figure 3 demonstrates examples of incorrect diagnoses, where the software did not identify diagnoses that were present in the report. Most commonly this occurred because the diagnosis was written in a pattern not recognized by the software, or simply a typographical error in the report.

Figure 3

Sample datasheet showing examples of missed diagnoses by the software. In row 1, “atypical hyperplasia” was not associated with either “ductal” or “lobular” and thus was not a pattern recognized by the software. In rows 2 and 3, the way in which “atypical ductal hyperplasia” was written was not a pattern recognized by the software. In row 3, typographical errors in the spelling of “carcinoma” meant the presence of DCIS was not detected by the processor To calculate the sensitivity, specificity, and predictive value of NLP, we considered “all diagnoses.” A true positive was defined as atypia present, correctly identified with NLP. A false positive was defined as atypia identified by NLP that was not present on the report. A true negative was defined as a benign diagnosis, correctly identified with NLP, and a false negative was defined as atypia present, but not identified by NLP. The sensitivity of NLP to correctly identify all diagnoses was 99.1%, with a specificity of 96.53%. The positive predictive value of NLP was 98.63% and the negative predictive value was 97.73%.

DISCUSSION

This study highlights one of the principal difficulties encountered in utilizing electronic data beyond its specific context. While a breast pathology report written in free text is easily read and interpreted on an individual patient basis, it is thus far not feasible to use information in this format in conjunction with other computerized systems, such as CDS systems. Pathology reports are long and contain multiple sections, which are frequently a mixture of free text and tabular data. Relevant clinical data may be contained within any section or format. Before NLP could be utilized to extract meaningful data, we first had to write a program to enable us to extract the sections which contained the data we were interested in. Due to the complexity of the task, we had to consciously exclude some valuable parts of the report, such as the “Note” section, where the pathologist might elaborate about alternative diagnoses. Having extracted the text from the sections of interest to us, we then found that there was significant variability in the way that breast diagnoses can be expressed. Sentence structure and different descriptive phraseology accounted for the majority of differences. This is clearly illustrated by the large numbers of synonymous terms encountered (e.g. 124 phrases describing invasive ductal carcinoma, 95 phrases describing invasive lobular carcinoma). Previous studies have highlighted the issue of context or semantics as being one of the inherent difficulties with using NLP to derive specific clinical information from large body of free text reports.[2] When medical notes or reports are read by medical personnel, the person automatically applies their own knowledge of both medicine and the clinical condition of the patient to interpret the report correctly, even when the vocabulary and grammar differ between reporting physicians. To overcome this obstacle, the approach of identifying all possible ways a given entity might be represented, creating patterns, and then using the NLP software to identify these entities in each report maximized the potential for NLP to correctly extract the information of interest. Even more striking is the very high number of potential ways of stating negation of each entity. The use of the negative, and its position either before or after the diagnosis of interest, required the creation of a further set of patterns to facilitate processor recognition of the negative, thereby excluding negated phrases from the final data set. For breast pathology reports, the frequent co-existence of several diagnoses within the same report such as “atypical ductal hyperplasia, ductal carcinoma in situ, and invasive ductal carcinoma” adds a further layer of complexity to the task of extracting interpretable data. We overcame this by assigning a level of importance to each diagnosis such that the final diagnosis was that which was most significant; however, each of the other diagnoses was also extracted and entered in the final database entry for that patient. In our study we used the concept of “episodes of care,” where several reports from a single surgery were compressed into a single episode of care, represented by a single entry on our final datasheet. Any pathology report from that patient outside of 6 months was considered as a “new episode.” Patients may then be followed over time, according to their “episodes” of care. Extraction and storage of data in this fashion facilitate future correlative and longitudinal studies examining the natural history of pathologic entities of interest such as LCIS and ADH. In this study we have used natural language processing to extract data from free text breast pathology reports and organize them into a format which could be more easily utilized for statistical analysis. NLP can be defined as a “theoretically motivated range of computational techniques for analyzing and representing naturally occurring texts at one or more levels of linguistic analysis for the purpose of achieving human-like language processing for a range of tasks or applications.”[3] The utility of NLP has previously been tested in a number of studies in medicine. Elkins et al. used NLP to code neuroradiology reports, and reported processor accuracy of 84% compared to 86% accuracy of human coders.[4] Hripcsak et al. developed a natural language processor MedLEE, and used it to code over 800,000 chest radiograph reports demonstrating a processor sensitivity of 0.81 and a specificity of 0.99 when compared to expert human coders.[5] This group has also examined the use of NLP systems to identify drug interactions and adverse drug events from the electronic medical record, as well as comparing current disease-specific drug prescribing with recommendations in the published medical literature.[26] Pathology reports present a unique field of study in natural language processing. They contain a large amount of valuable clinical information in a patient's care pathway. The coding of free text pathology reports has been attempted with varying degrees of success since the study done by Pratt et al. in 1978.[7] The potential difficulties in extracting information from surgical pathology reports were highlighted by Liu et al., who concluded that some variables are better “targets” for extraction than others. Staging and grading of cancer appeared to be particularly difficult to auto annotate.[8] We agreed with their assessment, and did not attempt to extract stage or grade in this study. Friedman and Xu found that tabular data and a lack of punctuation, combined with information on multiple specimens, made breast pathology reports difficult to process directly using NLP.[9] We were able to work around this difficulty by a multistep process that looked at specimens individually, and then by episode. Subsequent work from this group integrated the use of a preprocessor with an existing NLP system to overcome these issues, and reported combined system sensitivity of 90.6% and specificity of 91.6% compared with human coders.[10] In our study, our data also had to be preprocessed using a specifically written program to identify the regions of interest in each pathology report. We have demonstrated a low overall error rate of 2.22%, when the processor was compared to human coders. Sensitivity of 99.1% and specificity of 96.53% is significantly better than that quoted in other studies using similar technologies. During the 1970s and 1980s, the use of computer technology to provide decision making support to clinicians was widely tested with generally favorable results.[11] Advances in technological capabilities, along with a switch to electronic health records, have seen a resurgence in interest in computer aided, clinical decision support (CDS). Osheroff et al. defined the goal of CDS as a tool “to provide the right information, to the right person, in the right format, through the right channels, at the right point in workflow to improve health and health care decisions and outcomes.”[1] The most common CDS systems currently in use are drug-drug interaction programs and allergy programs. CDS is potentially far more widely applicable, facilitating clinician and patient access to the latest scientific evidence and practice guidelines. However, a large amount of clinical information still exists in an unstructured, nonstandardized format, which significantly limits the utility of CDS. In our current study we present an example of how a large body of free text medical information, in this instance, breast pathology reports, can be converted to a machine readable format. Storing clinical data in this way facilitates access to CDS systems which can potentially provide up - to-date information to clinicians and patients on risk assessment, screening, and management.

CONCLUSION

We have created a large database of valuable clinical information from over 76, 000 breast pathology reports. While we have demonstrated the utility of NLP, we have also been struck by the inherent complexity of using NLP in medical care. The time and effort required to use NLP for a single, well-defined problem should give pause to the idea that having data in any electronic format, even free text, will help us improve medical care. The design of Electronic Medical Records that use structured data and depend less and less on free text is critical.

10 in total

1. Coding neuroradiology reports for the Northern Manhattan Stroke Study: a comparison of natural language processing and manual review.

Authors: J S Elkins; C Friedman; B Boden-Albala; R L Sacco; G Hripcsak
Journal: Comput Biomed Res Date: 2000-02

2. Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports.

Authors: George Hripcsak; John H M Austin; Philip O Alderson; Carol Friedman
Journal: Radiology Date: 2002-07 Impact factor: 11.105

3. Facilitating research in pathology using natural language processing.

Authors: Hua Xu; Carol Friedman
Journal: AMIA Annu Symp Proc Date: 2003

Review 4. Does computer-aided clinical decision support improve the management of acute abdominal pain? A systematic review.

Authors: Jamie G Cooper; Robert M West; Susan E Clamp; Tajek B Hassan
Journal: Emerg Med J Date: 2010-11-02 Impact factor: 2.740

5. Automatic indexing of pathology data.

Authors: G S Dunham; M G Pacak; A W Pratt
Journal: J Am Soc Inf Sci Date: 1978-03

6. Automating tissue bank annotation from pathology reports - comparison to a gold standard expert annotation set.

Authors: Kaihong Liu; Kevin J Mitchell; Wendy W Chapman; Rebecca S Crowley
Journal: AMIA Annu Symp Proc Date: 2005

7. A roadmap for national action on clinical decision support.

Authors: Jerome A Osheroff; Jonathan M Teich; Blackford Middleton; Elaine B Steen; Adam Wright; Don E Detmer
Journal: J Am Med Inform Assoc Date: 2007-01-09 Impact factor: 4.497

8. Detection of practice pattern trends through Natural Language Processing of clinical narratives and biomedical literature.

Authors: Elizabeth S Chen; Peter D Stetson; Yves A Lussier; Marianthi Markatou; George Hripcsak; Carol Friedman
Journal: AMIA Annu Symp Proc Date: 2007-10-11

9. Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study.

Authors: Elizabeth S Chen; George Hripcsak; Hua Xu; Marianthi Markatou; Carol Friedman
Journal: J Am Med Inform Assoc Date: 2007-10-18 Impact factor: 4.497

10. Facilitating cancer research using natural language processing of pathology reports.

Authors: Hua Xu; Kristin Anderson; Victor R Grann; Carol Friedman
Journal: Stud Health Technol Inform Date: 2004

10 in total

38 in total

1. Validation of a Semiautomated Natural Language Processing-Based Procedure for Meta-Analysis of Cancer Susceptibility Gene Penetrance.

Authors: Zhengyi Deng; Kanhua Yin; Yujia Bao; Victor Diego Armengol; Cathy Wang; Ankur Tiwari; Regina Barzilay; Giovanni Parmigiani; Danielle Braun; Kevin S Hughes
Journal: JCO Clin Cancer Inform Date: 2019-08

2. Do Neural Information Extraction Algorithms Generalize Across Institutions?

Authors: Enrico Santus; Clara Li; Adam Yala; Donald Peck; Rufina Soomro; Naveen Faridi; Isra Mamshad; Rong Tang; Conor R Lanahan; Regina Barzilay; Kevin Hughes
Journal: JCO Clin Cancer Inform Date: 2019-07

3. Evaluation of an Automated Information Extraction Tool for Imaging Data Elements to Populate a Breast Cancer Screening Registry.

Authors: Ronilda Lacson; Kimberly Harris; Phyllis Brawarsky; Tor D Tosteson; Tracy Onega; Anna N A Tosteson; Abby Kaye; Irina Gonzalez; Robyn Birdwell; Jennifer S Haas
Journal: J Digit Imaging Date: 2015-10 Impact factor: 4.056

4. Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research.

Authors: Florian R Schroeck; Olga V Patterson; Patrick R Alba; Erik A Pattison; John D Seigne; Scott L DuVall; Douglas J Robertson; Brenda Sirovich; Philip P Goodney
Journal: Urology Date: 2017-09-12 Impact factor: 2.649

5. Clinical Informatics Researcher's Desiderata for the Data Content of the Next Generation Electronic Health Record.

Authors: Timothy I Kennell; James H Willig; James J Cimino
Journal: Appl Clin Inform Date: 2017-12-21 Impact factor: 2.342

Review 6. Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review.

Authors: Kory Kreimeyer; Matthew Foster; Abhishek Pandey; Nina Arya; Gwendolyn Halford; Sandra F Jones; Richard Forshee; Mark Walderhaug; Taxiarchis Botsis
Journal: J Biomed Inform Date: 2017-07-17 Impact factor: 6.317

7. Information extraction for prognostic stage prediction from breast cancer medical records using NLP and ML.

Authors: Pratiksha R Deshmukh; Rashmi Phalnikar
Journal: Med Biol Eng Comput Date: 2021-07-23 Impact factor: 2.602

8. Validity of Natural Language Processing for Ascertainment of EGFR and ALK Test Results in SEER Cases of Stage IV Non-Small-Cell Lung Cancer.

Authors: Bernardo Haddock Lobo Goulart; Emily T Silgard; Christina S Baik; Aasthaa Bansal; Qin Sun; Eric B Durbin; Isaac Hands; Darshil Shah; Susanne M Arnold; Scott D Ramsey; Ramakanth Kavuluru; Stephen M Schwartz
Journal: JCO Clin Cancer Inform Date: 2019-05

9. Assessing the Utility of Automatic Cancer Registry Notifications Data Extraction from Free-Text Pathology Reports.

Authors: Anthony N Nguyen; Julie Moore; John O'Dwyer; Shoni Philpot
Journal: AMIA Annu Symp Proc Date: 2015-11-05

10. Accelerated training of bootstrap aggregation-based deep information extraction systems from cancer pathology reports.

Authors: Hong-Jun Yoon; Hilda B Klasky; John P Gounley; Mohammed Alawad; Shang Gao; Eric B Durbin; Xiao-Cheng Wu; Antoinette Stroup; Jennifer Doherty; Linda Coyle; Lynne Penberthy; J Blair Christian; Georgia D Tourassi
Journal: J Biomed Inform Date: 2020-09-09 Impact factor: 6.317