| Literature DB >> 31438083 |
Ergin Soysal1, Jeremy L Warner2,3,4, Jingqi Wang1, Min Jiang1, Krysten Harvey3, Sandeep Kumar Jain5, Xiao Dong1, Hsing-Yi Song1, Harish Siddhanamatha1, Liwei Wang6, Qi Dai2, Qingxia Chen7, Xianglin Du8, Cui Tao1, Ping Yang6, Joshua Charles Denny2,3, Hongfang Liu6, Hua Xu1.
Abstract
Natural language processing (NLP) technologies have been successfully applied to cancer research by enabling automated phenotypic information extraction from narratives in electronic health records (EHRs) such as pathology reports; however, developing customized NLP solutions requires substantial effort. To facilitate the adoption of NLP in cancer research, we have developed a set of customizable modules for extracting comprehensive types of cancer-related information in pathology reports (e.g., tumor size, tumor stage, and biomarkers), by leveraging the existing CLAMP system, which provides user-friendly interfaces for building customized NLP solutions for individual needs. Evaluation using annotated data at Vanderbilt University Medical Center showed that CLAMP-Cancer could extract diverse types of cancer information with good F-measures (0.80-0.98). We then applied CLAMP-Cancer to an information extraction task at Mayo Clinic and showed that we can quickly build a customized NLP system with comparable performance with an existing system at Mayo Clinic. CLAMP-Cancer is freely available for academic use.Entities:
Keywords: Electronic Health Records; Information Storage and Retrieval; Natural Language Processing
Mesh:
Year: 2019 PMID: 31438083 PMCID: PMC7359882 DOI: 10.3233/SHTI190383
Source DB: PubMed Journal: Stud Health Technol Inform ISSN: 0926-9630
Figure 1.CLAMP information model is based on data element suggestions of College of American Pathologists
Figure 2.a. CLAMP Cancer module implements several cancer specific components (A), can be used to build customized pipelines (B). b. Pipeline annotates pathology reports to extract entities as well as their relationships.
Evaluation results of CLAMP Cancer Modules
| Type of | # of | Entity only | Entity and Relation | ||||
|---|---|---|---|---|---|---|---|
| Precision | Recall | F-measure | Precision | Recall | F-measure | ||
| 310 | 0.99/0.99 | 0.99/0.99 | 0.99/0.99 | 0.97 | 0.98 | 0.98 | |
| 351 | 0.98/0.99 | 0.98/0.99 | 0.98/0.99 | 0.98 | 0.98 | 0.98 | |
| 187 | 0.96/0.98 | 0.82/0.83 | 0.89/0.90 | 0.88 | 0.78 | 0.83 | |
| 339 | 0.98/0.99 | 0.98/0.99 | 0.98/0.99 | 0.97 | 0.97 | 0.97 | |
| 553 | 0.91/1.00 | 0.85/0.93 | 0.88/0.97 | 0.90 | 0.85 | 0.86 | |
| 92 | 0.96/1.00 | 0.88/0.91 | 0.92/0.96 | 0.91 | 0.83 | 0.86 | |
| 60 | 0.96/0.96 | 0.90/0.90 | 0.93/0.93 | 0.88 | 0.83 | 0.85 | |
| 93 | 0.92/0.99 | 0.91/0.98 | 0.92/0.98 | 0.80 | 0.79 | 0.80 | |
| 71 | 0.92/1.00 | 0.83/0.90 | 0.87/0.95 | 0.86 | 0.78 | 0.82 | |
| 107 | 0.95/0.99 | 0.90/0.94 | 0.92/0.96 | 0.88 | 0.84 | 0.86 | |
| CLAMP-Cancer | MedKAT | ||||||
| Precision | Recall | F-1 | Precision | Recall | F-measure | ||
| 1.00 | 0.99 | 0.99 | 1.00 | 1.00 | 1.00 | ||
| 1.00 | 0.99 | 0.99 | 0.99 | 1.00 | 0.99 | ||
| 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ||
| 0.94 | 0.89 | 0.92 | 0.96 | 0.95 | 0.96 | ||
| 0.91 | 0.92 | 0.92 | 0.96 | 0.98 | 0.97 | ||
| 1.00 | 0.88 | 0.94 | 0.93 | 0.97 | 0.99 | ||
| 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ||
The number of each type of entities in the test corpus of 200 notes.