Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Machine learning to parse breast pathology reports in Chinese.

Literature DB >> 29380208

Machine learning to parse breast pathology reports in Chinese.

Rong Tang¹, Lizhi Ouyang², Clara Li³, Yue He², Molly Griffin¹, Alphonse Taghian⁴, Barbara Smith¹, Adam Yala³, Regina Barzilay³, Kevin Hughes¹.

Abstract

INTRODUCTION: Large structured databases of pathology findings are valuable in deriving new clinical insights. However, they are labor intensive to create and generally require manual annotation. There has been some work in the bioinformatics community to support automating this work via machine learning in English. Our contribution is to provide an automated approach to construct such structured databases in Chinese, and to set the stage for extraction from other languages.
METHODS: We collected 2104 de-identified Chinese benign and malignant breast pathology reports from Hunan Cancer Hospital. Physicians with native Chinese proficiency reviewed the reports and annotated a variety of binary and numerical pathologic entities. After excluding 78 cases with a bilateral lesion in the same report, 1216 cases were used as a training set for the algorithm, which was then refined by 405 development cases. The Natural language processing algorithm was tested by using the remaining 405 cases to evaluate the machine learning outcome. The model was used to extract 13 binary entities and 8 numerical entities.
RESULTS: When compared to physicians with native Chinese proficiency, the model showed a per-entity accuracy from 91 to 100% for all common diagnoses on the test set. The overall accuracy of binary entities was 98% and of numerical entities was 95%. In a per-report evaluation for binary entities with more than 100 training cases, 85% of all the testing reports were completely correct and 11% had an error in 1 out of 22 entities.
CONCLUSION: We have demonstrated that Chinese breast pathology reports can be automatically parsed into structured data using standard machine learning approaches. The results of our study demonstrate that techniques effective in parsing English reports can be scaled to other languages.

Entities: Disease

Keywords: Chinese; Electronic health record (EHR); Machine learning; Natural language processing (NLP); Pathology reports

Mesh：

Year: 2018 PMID： 29380208 DOI： 10.1007/s10549-018-4668-3

Source DB: PubMed Journal: Breast Cancer Res Treat ISSN： 0167-6806 Impact factor: 4.872

Keyword Cloud
Cited

8 in total

1. Validation of a Semiautomated Natural Language Processing-Based Procedure for Meta-Analysis of Cancer Susceptibility Gene Penetrance.

Authors: Zhengyi Deng; Kanhua Yin; Yujia Bao; Victor Diego Armengol; Cathy Wang; Ankur Tiwari; Regina Barzilay; Giovanni Parmigiani; Danielle Braun; Kevin S Hughes
Journal: JCO Clin Cancer Inform Date: 2019-08

2. Automated medical chart review for breast cancer outcomes research: a novel natural language processing extraction system.

Authors: Yifu Chen; Lucy Hao; Vito Z Zou; Zsuzsanna Hollander; Raymond T Ng; Kathryn V Isaac
Journal: BMC Med Res Methodol Date: 2022-05-12 Impact factor: 4.612

3. Findings from the 2019 International Medical Informatics Association Yearbook Section on Health Information Management.

Authors: Meryl Bloomrosen; Eta S Berner
Journal: Yearb Med Inform Date: 2019-08-16

4. Clinical Data Extraction and Normalization of Cyrillic Electronic Health Records Via Deep-Learning Natural Language Processing.

Authors: Boyang Zhao
Journal: JCO Clin Cancer Inform Date: 2019-09

5. Automated Classification of Online Sources for Infectious Disease Occurrences Using Machine-Learning-Based Natural Language Processing Approaches.

Authors: Mira Kim; Kyunghee Chae; Seungwoo Lee; Hong-Jun Jang; Sukil Kim
Journal: Int J Environ Res Public Health Date: 2020-12-17 Impact factor: 3.390

6. A Question-and-Answer System to Extract Data From Free-Text Oncological Pathology Reports (CancerBERT Network): Development Study.

Authors: Joseph Ross Mitchell; Phillip Szepietowski; Rachel Howard; Phillip Reisman; Jennie D Jones; Patricia Lewis; Brooke L Fridley; Dana E Rollison
Journal: J Med Internet Res Date: 2022-03-23 Impact factor: 7.076

Review 7. Research and Application of Artificial Intelligence Based on Electronic Health Records of Patients With Cancer: Systematic Review.

Authors: Xinyu Yang; Dongmei Mu; Hao Peng; Hua Li; Ying Wang; Ping Wang; Yue Wang; Siqi Han
Journal: JMIR Med Inform Date: 2022-04-20

8. Validation of an algorithm to evaluate the appropriateness of outpatient antibiotic prescribing using big data of Chinese diagnosis text.

Authors: Houyu Zhao; Jiaming Bian; Li Wei; Liuyi Li; Yingqiu Ying; Zeyu Zhang; Xiaoying Yao; Lin Zhuo; Bin Cao; Mei Zhang; Siyan Zhan
Journal: BMJ Open Date: 2020-03-19 Impact factor: 2.692

8 in total