Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Using machine learning to parse breast pathology reports.

Literature DB >> 27826755

Using machine learning to parse breast pathology reports.

Adam Yala¹, Regina Barzilay¹, Laura Salama², Molly Griffin³, Grace Sollender⁴, Aditya Bardia⁵, Constance Lehman⁶, Julliette M Buckley⁷, Suzanne B Coopey⁷, Fernanda Polubriaginof⁸, Judy E Garber⁹, Barbara L Smith⁷, Michele A Gadd⁷, Michelle C Specht⁷, Thomas M Gudewicz¹⁰, Anthony J Guidi¹¹, Alphonse Taghian², Kevin S Hughes⁷.

Abstract

PURPOSE: Extracting information from electronic medical record is a time-consuming and expensive process when done manually. Rule-based and machine learning techniques are two approaches to solving this problem. In this study, we trained a machine learning model on pathology reports to extract pertinent tumor characteristics, which enabled us to create a large database of attribute searchable pathology reports. This database can be used to identify cohorts of patients with characteristics of interest.
METHODS: We collected a total of 91,505 breast pathology reports from three Partners hospitals: Massachusetts General Hospital, Brigham and Women's Hospital, and Newton-Wellesley Hospital, covering the period from 1978 to 2016. We trained our system with annotations from two datasets, consisting of 6295 and 10,841 manually annotated reports. The system extracts 20 separate categories of information, including atypia types and various tumor characteristics such as receptors. We also report a learning curve analysis to show how much annotation our model needs to perform reasonably.
RESULTS: The model accuracy was tested on 500 reports that did not overlap with the training set. The model achieved accuracy of 90% for correctly parsing all carcinoma and atypia categories for a given patient. The average accuracy for individual categories was 97%. Using this classifier, we created a database of 91,505 parsed pathology reports.
CONCLUSIONS: Our learning curve analysis shows that the model can achieve reasonable results even when trained on a few annotations. We developed a user-friendly interface to the database that allows physicians to easily identify patients with target characteristics and export the matching cohort. This model has the potential to reduce the effort required for analyzing large amounts of data from medical records, and to minimize the cost and time required to glean scientific insight from these data.

Entities: Disease Species

Keywords: Atypia; Breast pathology; Carcinoma in situ; Hyperplasia; Machine learning; Natural language processing; Pathology reports

Mesh：

Year: 2016 PMID： 27826755 DOI： 10.1007/s10549-016-4035-1

Source DB: PubMed Journal: Breast Cancer Res Treat ISSN： 0167-6806 Impact factor: 4.872

Keyword Cloud
Cited

36 in total

1. Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics.

Authors: Tomasz Oliwa; Steven B Maron; Leah M Chase; Samantha Lomnicki; Daniel V T Catenacci; Brian Furner; Samuel L Volchenboum
Journal: JCO Clin Cancer Inform Date: 2019-08

2. Validation of a Semiautomated Natural Language Processing-Based Procedure for Meta-Analysis of Cancer Susceptibility Gene Penetrance.

Authors: Zhengyi Deng; Kanhua Yin; Yujia Bao; Victor Diego Armengol; Cathy Wang; Ankur Tiwari; Regina Barzilay; Giovanni Parmigiani; Danielle Braun; Kevin S Hughes
Journal: JCO Clin Cancer Inform Date: 2019-08

3. Do Neural Information Extraction Algorithms Generalize Across Institutions?

Authors: Enrico Santus; Clara Li; Adam Yala; Donald Peck; Rufina Soomro; Naveen Faridi; Isra Mamshad; Rong Tang; Conor R Lanahan; Regina Barzilay; Kevin Hughes
Journal: JCO Clin Cancer Inform Date: 2019-07

Review 4. Making Sense of Big Textual Data for Health Care: Findings from the Section on Clinical Natural Language Processing.

Authors: A Névéol; P Zweigenbaum
Journal: Yearb Med Inform Date: 2017-09-11

Review 5. Capturing the Patient's Perspective: a Review of Advances in Natural Language Processing of Health-Related Text.

Authors: G Gonzalez-Hernandez; A Sarker; K O'Connor; G Savova
Journal: Yearb Med Inform Date: 2017-09-11

6. A Frame-Based NLP System for Cancer-Related Information Extraction.

Authors: Yuqi Si; Kirk Roberts
Journal: AMIA Annu Symp Proc Date: 2018-12-05

Review 7. Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records.

Authors: Guergana K Savova; Ioana Danciu; Folami Alamudun; Timothy Miller; Chen Lin; Danielle S Bitterman; Georgia Tourassi; Jeremy L Warner
Journal: Cancer Res Date: 2019-08-08 Impact factor: 12.701

8. Kernel-Based Microfluidic Constriction Assay for Tumor Sample Identification.

Authors: Xiang Ren; Parham Ghassemi; Yasmine M Kanaan; Tammey Naab; Robert L Copeland; Robert L Dewitty; Inyoung Kim; Jeannine S Strobl; Masoud Agah
Journal: ACS Sens Date: 2018-07-18 Impact factor: 7.711

9. Accelerated training of bootstrap aggregation-based deep information extraction systems from cancer pathology reports.

Authors: Hong-Jun Yoon; Hilda B Klasky; John P Gounley; Mohammed Alawad; Shang Gao; Eric B Durbin; Xiao-Cheng Wu; Antoinette Stroup; Jennifer Doherty; Linda Coyle; Lynne Penberthy; J Blair Christian; Georgia D Tourassi
Journal: J Biomed Inform Date: 2020-09-09 Impact factor: 6.317

10. Automated NLP Extraction of Clinical Rationale for Treatment Discontinuation in Breast Cancer.

Authors: Matthew S Alkaitis; Monica N Agrawal; Gregory J Riely; Pedram Razavi; David Sontag
Journal: JCO Clin Cancer Inform Date: 2021-05