| Literature DB >> 35453927 |
Wee-Ming Tan1, Kean-Hooi Teoh2, Mogana Darshini Ganggayah1, Nur Aishah Taib3, Hana Salwani Zaini4, Sarinder Kaur Dhillon1.
Abstract
Pathology reports represent a primary source of information for cancer registries. University Malaya Medical Centre (UMMC) is a tertiary hospital responsible for training pathologists; thus narrative reporting becomes important. However, the unstructured free-text reports made the information extraction process tedious for clinical audits and data analysis-related research. This study aims to develop an automated natural language processing (NLP) algorithm to summarize the existing narrative breast pathology report from UMMC to a narrower structured synoptic pathology report with a checklist-style report template to ease the creation of pathology reports. The development of the rule-based NLP algorithm was based on the R programming language by using 593 pathology specimens from 174 patients provided by the Department of Pathology, UMMC. The pathologist provides specific keywords for data elements to define the semantic rules of the NLP. The system was evaluated by calculating the precision, recall, and F1-score. The proposed NLP algorithm achieved a micro-F1 score of 99.50% and a macro-F1 score of 98.97% on 178 specimens with 25 data elements. This achievement correlated to clinicians' needs, which could improve communication between pathologists and clinicians. The study presented here is significant, as structured data is easily minable and could generate important insights.Entities:
Keywords: information extraction; natural language processing; pathology reporting; rule based; synoptic reporting; text mining
Year: 2022 PMID: 35453927 PMCID: PMC9027647 DOI: 10.3390/diagnostics12040879
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Figure 1Representation of pathology report’s composition.
Figure 2Overview of study.
Key elements identified by the pathologist in UMMC.
| Data Elements | Description |
|---|---|
|
| |
|
Date |
Examination date. |
|
Patient’s register number |
Unique ID for a patient. |
|
Report’s reference number |
Unique ID for a report. |
|
| |
|
Type of procedure |
Procedure used to obtain specimen. |
|
Specimen laterality |
Site of breast involved. |
|
Histologic type |
Description of a tumor look under a microscope. |
|
Histologic grade |
Nottingham modification of Bloom-Richardson (NSBR) grading system based on tumor tubule formation, number of mitotic figures in most active areas and nuclear pleomorphism. |
|
| |
|
Lesion |
Type of lesions and its size observed macro and microscopically. |
|
Margins |
Distance of lesion from different margins. |
|
| |
|
Ductal Carcinoma In Situ (DCIS) grades |
Grading to describe the distance cancer cells resemble normal breast cells and how fast they grow. |
|
DCIS appearances |
Architectural growth pattern of DCIS. |
|
Lymphovascular invasion |
Presence of tumor cells in lymphatics or blood vessels. |
|
Skin change involvement |
Presence of skin change include puckering, dimpling, a rash, or redness of the skin of the breast. |
|
Paget disease |
Presence of eczema-like changes to the skin of the nipple and the area of darker skin surrounding the nipple. |
|
Regional lymph nodes |
Number of lymph nodes examined and number of lymph nodes involved by tumor cell. |
|
Ancillary studies |
Breast biomarker testing results for estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER2) by IHC. |
Figure 3Input for the NLP algorithm.
Variables extracted with its regular expression.
| Data Elements | Regular Expression |
|---|---|
|
Type of procedure |
Needle biopsy; hook-wire localization biopsy; excision; mastectomy |
|
Type of lesion |
Tumor; fibrotic lesion; fibrosis; cyst; mass; nodule |
|
Size of lesion |
[numeric][units]? × [numeric][units]? × [numeric][units]; [numeric][units]? × [numeric][units]; [numeric][units]? * where symbol “?” indicated that the value is optional |
|
Margins distance |
[numeric][units]; <[numeric][units]; >[numeric][units]; [numeric][units]? − [numeric][units] |
|
Margins involved |
Anterior; deep; superior; inferior; medial; lateral; posterior; superficial; peripheral; axis |
|
Presence of lymphovascular invasion |
Lymphovascular invasion; Lymphovascular permeation Terms to described if present: - Seen; noted; presented; observed; detected Terms to described if absent: - No; not detected; absent |
|
Skin change involvement |
Skin change; skin lesion; skin Terms to described if present: - Changes are seen; seen; noted; presented; observed; detected Terms to described if absent: - No; not detected; absent |
|
Presence of Paget’s disease |
Paget disease; Paget cell; Pagetoid spread Terms to described if present: - Seen; noted; presented; observed; detected Terms to described if absent: - No; not detected; absent |
|
NSBR grade |
Grade 1; grade 2; grade 3 |
|
DCIS grade |
Low; intermediate; low to intermediate; high |
|
DCIS appearance |
Cribriform; micropapillary; papillary; solid; flat or clinging; comedo |
|
Histologic type |
No residual invasive carcinoma; Invasive lobular carcinoma; Invasive cribriform carcinoma; papillary carcinoma with invasion; papillary carcinoma |
|
Total number of lymph nodes examined |
identified [numeric] lymph node(s); all [numeric] lymph node(s); identified a total of [numeric] lymph node(s); [numeric] out of the [numeric] lymph nodes examined |
|
Number of lymph nodes show malignancy |
metastatic carcinoma in [numeric] out of the [numeric] lymph nodes; [numeric] lymph nodes are effaced and replaced by malignant cells| |
|
ER test result |
Positive; negative; weak staining; strong staining; less than [numeric]% staining; more than [numeric]% staining |
|
PR test result |
Positive; negative; weak staining; strong staining; less than [numeric]% staining; more than [numeric]% staining |
|
HER2 test result |
Positive; negative; equivocal; overexpressed; not overexpressed; score 0 to 3+ |
Figure 4ERD of database for pathology reporting.
Evaluation result of training and testing data.
| Precision | Recall | F1 Score | ||
|---|---|---|---|---|
|
|
| 0.9958 | 0.9960 | 0.9959 |
|
| 0.9926 | 0.9936 | 0.9931 | |
|
|
| 0.9942 | 0.9959 | 0.9950 |
|
| 0.9820 | 0.9914 | 0.9897 |
Examples of pathology report information in narrative and synoptic format, respectively.
| Narrative | Synoptic |
|---|---|
|
Received a mastectomy specimen weighing 790 g and measuring 17.5 cm × 15.5 cm × 4.5 cm. |
Procedure: mastectomy |
|
Specimen labelled as right breast retro-areolar. |
Specimen laterality: right |
|
Interpretation: Invasive carcinoma, nst |
Histologic type: invasive carcinoma of no special type |
|
Interpretation: Bloom-richardson grade 2 |
Histologic grade: Grade 2 (Bloom and Richardson) |
|
There is no lymphovascular invasion |
Lymphovascular invasion: absent |
|
They are focally positive for PR and ER. The HER-2 expression is negative (0). |
Ancillary studies: ER biomarker result: positive PR biomarker result: positive HER2 biomarker result: negative, score 0 |
|
Sections show a total of 10 lymph nodes with no evidence of tumour metastasis. |
Regional lymph nodes: Lymph node(s) examined: 10 Lymph node(s) show(s) malignancy: 0 |
|
A small bit of skin tissue is seen and no Paget’s disease is observed. |
Paget’s disease: absent |
|
The specimen measures 13.5 cm × 14.5 cm × 4 cm, a serial section shows a tumour bed measures 3 cm × 2 cm × 1.5 cm located at upper outer quadrant. |
Lesion size (macroscopy): Tumor bed size: 3 cm × 2 cm × 1.5 cm |
|
It is 3.5 cm from superior margin, 7.5 cm from inferior and medial margins, 4.5 cm from lateral margin, 0.5 cm from anterior margin and 1 cm from deep margin. |
Margin (macroscopy): Tumor Bed margin: superior margin = 3.5 cm, inferior margin = 7.5 cm, medial margin = 7.5 cm, lateral margin = 4.5 cm, anterior margin = 0.5 cm, deep margin = 1 cm |