| Literature DB >> 35549854 |
Yifu Chen1,2, Lucy Hao1,2, Vito Z Zou3, Zsuzsanna Hollander1,2, Raymond T Ng1,2, Kathryn V Isaac4.
Abstract
BACKGROUND: Manually extracted data points from health records are collated on an institutional, provincial, and national level to facilitate clinical research. However, the labour-intensive clinical chart review process puts an increasing burden on healthcare system budgets. Therefore, an automated information extraction system is needed to ensure the timeliness and scalability of research data.Entities:
Keywords: Breast cancer; Health data; Natural language processing
Mesh:
Year: 2022 PMID: 35549854 PMCID: PMC9101856 DOI: 10.1186/s12874-022-01583-z
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.612
Fig. 1Overview of the NLP system design, input, and outputs
Diagnostic and treatment characteristics in operative and pathology report cohorts. SD; standard deviation, LN; lymph node
Training Cohort ( | Validation Cohort ( | |
| Laterality | ||
| Unilateral | 45 (90%) | 49 (98%) |
| Bilateral | 5 (10%) | 1 (2%) |
| Cancer Type | ||
| Invasive | 42 (76.4%) | 41 (80.4%) |
| Non-Invasive | 13 (23.6%) | 10 (19.6%) |
| Margins | ||
| Positive | 10 (18.2%) | 9 (17.6%) |
| Negative | 45 (81.8%) | 42 (82.4%) |
| Lymph Nodes | ||
| Avg. LNs Examined (SD) | 4.6 (4.3) | 4.3 (3.4) |
| Micro/Macro Metastasis | 16 (29.1%) | 13 (25.5%) |
| Extranodal Extension | 8 (14.5%) | 3 (5.9%) |
| Pathologic Diagnosis | ||
| Avg. Number of Foci (SD) | 1.9 (2.0) | 2.0 (2.2) |
| Avg. Nottingham Score (SD) | 6.3 (1.6) | 6.7 (1.6) |
| Avg. Tumour Size in mm. (SD) | 28.2 (28.8) | 27.9 (20.0) |
| Lymphovascular Invasion | 11 (20%) | 10 (19.6%) |
Training Cohort ( | Validation Cohort ( | |
| Laterality | ||
| Unilateral | 48 (96%) | 49 (98%) |
| Bilateral | 2 (4%) | 1 (2%) |
| Procedure Type | ||
| Lumpectomy | 19 (36.5%) | 16 (31.4%) |
| Nipple-Sparing Mastectomy | 15 (28.9%) | 21 (41.2%) |
| Skin-Sparing Mastectomy | 16 (30.7%) | 14 (27.5%) |
| Total Mastectomy | 2 (3.8%) | 0 |
| Neoadjuvant Treatment | ||
| Chemotherapy | 4 (7.7%) | 14 (27.5%) |
| None | 48 (92.3%) | 37 (72.5%) |
| Immediate Reconstruction | ||
| Mentioned | 50 (96.2%) | 45 (88.2%) |
| Not Mentioned | 2 (3.8%) | 6 (11.8%) |
| Axillary Surgery | ||
| Sentinel LN Biopsy | 40 (76.9%) | 38 (74.5%) |
| Axillary LN Dissection | 4 (7.7%) | 5 (9.8%) |
| None | 8 (15.4%) | 8 (15.6%) |
Detailed accuracy metrics for the NLP system with respect to the ground truth (GT) in operative reports
| Outcome variable | Accuracy | Precision | Recall | F-Score |
|---|---|---|---|---|
| Laterality | 0.90 | 0.94 | 0.96 | 0.95 |
| Surgical Indication | 0.96 | 0.98 | 0.98 | 0.98 |
| Pre-Operative Biopsy | 0.96 | 1.00 | 0.96 | 0.98 |
| Pre-Operative Diagnosis | 0.96 | 0.98 | 0.98 | 0.98 |
| Neoadjuvant Treatment | 0.98 | 1.00 | 0.98 | 0.99 |
| Breast Procedure | 0.92 | 0.94 | 0.98 | 0.96 |
| Immediate Reconstruction | 0.92 | 0.94 | 0.98 | 0.96 |
| Immediate Reconstruction Type | 0.86 | 0.90 | 0.96 | 0.92 |
| Wire Localization | 0.88 | 0.92 | 0.96 | 0.94 |
| Breast Incision Type | 0.87 | 0.89 | 0.96 | 0.93 |
| Axillary Surgery | 0.90 | 0.96 | 0.94 | 0.95 |
Detailed accuracy metrics for the NLP system with respect to the ground truth (GT) in pathology reports. DCIS; ductal carcinoma in situ, LN; lymph nodes
| Outcome Variable | Accuracy | Precision | Recall | F-Score |
|---|---|---|---|---|
| Invasive Carcinoma | 1.00 | 1.00 | 1.00 | 1.00 |
| Invasive Histologic Type | 0.94 | 0.94 | 1.00 | 0.97 |
| Nottingham Score | 0.98 | 1.00 | 0.97 | 0.99 |
| Glandular Differentiation | 0.96 | 0.95 | 1.00 | 0.98 |
| Nuclear Pleomorphism | 0.98 | 0.98 | 1.00 | 0.99 |
| Mitotic Rate | 0.98 | 0.98 | 1.00 | 0.99 |
| Histologic Grade | 0.96 | 0.97 | 0.97 | 0.97 |
| Tumour Size (mm) | 0.92 | 0.95 | 0.95 | 0.95 |
| Tumour Focality | 1.00 | 1.00 | 1.00 | 1.00 |
| # of Foci | 0.98 | 1.00 | 0.97 | 0.99 |
| Tumour Site | 1.00 | 1.00 | 1.00 | 1.00 |
| Lymphovascular Invasion | 0.96 | 0.95 | 1.00 | 0.98 |
| In situ Component | 0.98 | 0.98 | 1.00 | 0.99 |
| In situ Type | 0.98 | 1.00 | 0.98 | 0.99 |
| In situ Nuclear Grade | 1.00 | 1.00 | 1.00 | 1.00 |
| Necrosis | 0.98 | 0.97 | 1.00 | 0.99 |
| DCIS Extent | 0.82 | 0.86 | 0.79 | 0.83 |
| Architectural Patterns | 1.00 | 1.00 | 1.00 | 1.00 |
| Invasive Carcinoma Margins | 0.94 | 0.93 | 1.00 | 0.97 |
| Distance from Closest Margin | 0.84 | 0.97 | 0.81 | 0.88 |
| Closest Margin | 0.90 | 0.97 | 0.89 | 0.93 |
| DCIS Margins | 0.78 | 0.78 | 1.00 | 0.88 |
| Distance of DCIS from Closest Margin (mm) | 0.86 | 0.92 | 0.81 | 0.86 |
| Closest Margin DCIS | 0.83 | 0.90 | 0.76 | 0.83 |
| Total LN Examined | 0.98 | 1.00 | 0.98 | 0.99 |
| # Sentinel LN Examined | 1.00 | 1.00 | 1.00 | 1.00 |
| Micro/macro metastasis | 0.88 | 0.87 | 1.00 | 0.93 |
| # LN with Micro-metastasis | 1.00 | 1.00 | 1.00 | 1.00 |
| # LN with Macro-metastasis | 1.00 | 1.00 | 1.00 | 1.00 |
| Size of largest Macro-metastasis Deposit | 0.98 | 1.00 | 0.91 | 0.95 |
| Extranodal Extension | 1.00 | 1.00 | 1.00 | 1.00 |
| Extent (mm) | 1.00 | 1.00 | 1.00 | 1.00 |
| Invasive Tumour Size (mm) | 0.94 | 0.95 | 0.97 | 0.96 |
| # Sentinel Nodes Examined | 0.96 | 0.95 | 1.00 | 0.97 |
| # Micro-metastatic Nodes | 1.00 | 1.00 | 1.00 | 1.00 |
| # Macro-metastatic Nodes | 1.00 | 1.00 | 1.00 | 1.00 |
| Pathologic Stage | 0.98 | 1.00 | 0.98 | 0.99 |
Detailed accuracy metrics for the human annotator with respect to the ground truth (GT) in operative reports. All scores were computed by averaging the metrics across training and test cohorts
| Outcome variable | Accuracy | Precision | Recall | F-Score |
|---|---|---|---|---|
| Laterality | 0.99 | 0.99 | 1.00 | 1.00 |
| Surgical Indication | 0.97 | 0.97 | 1.00 | 0.99 |
| youPre-Operative Biopsy | 0.98 | 0.98 | 1.00 | 0.99 |
| Pre-Operative Diagnosis | 0.93 | 0.93 | 1.00 | 0.97 |
| Neoadjuvant Treatment | 0.98 | 0.98 | 1.00 | 0.99 |
| Breast Procedure | 0.97 | 0.97 | 1.00 | 0.99 |
| Immediate Reconstruction | 0.89 | 0.70 | 1.00 | 0.80 |
| Immediate Reconstruction Type | 0.60 | 0.66 | 0.99 | 0.78 |
| Wire Localization | 0.90 | 0.76 | 0.99 | 0.84 |
| Breast Incision Type | 0.81 | 0.89 | 0.84 | 0.85 |
| Axillary Surgery | 0.96 | 0.96 | 1.00 | 0.98 |
Detailed accuracy metrics for the human annotator with respect to the ground truth (GT) in pathology reports. DCIS; ductal carcinoma in situ, LN; lymph nodes. All scores were computed by averaging the metrics across training and test cohorts
| Outcome Variable | Accuracy | Precision | Recall | F-Score |
|---|---|---|---|---|
| Invasive Carcinoma | 0.98 | 1.00 | 0.98 | 0.99 |
| Invasive Histologic Type | 0.95 | 0.97 | 0.98 | 0.97 |
| Nottingham Score | 0.62 | 1.00 | 1.00 | 1.00 |
| Glandular Differentiation | 0.97 | 0.99 | 0.98 | 0.98 |
| Nuclear Pleomorphism | 0.97 | 0.98 | 0.98 | 0.98 |
| Mitotic Rate | 0.96 | 0.95 | 0.98 | 0.97 |
| Histologic Grade | 0.96 | 0.99 | 0.95 | 0.97 |
| Tumour Size (mm) | 0.98 | 0.96 | 0.98 | 0.97 |
| Tumour Focality | 0.96 | 0.97 | 0.98 | 0.97 |
| # of Foci | 0.96 | 0.98 | 0.97 | 0.97 |
| Tumour Site | 0.95 | 0.74 | 0.94 | 0.81 |
| Lymphovascular Invasion | 0.97 | 0.98 | 0.98 | 0.98 |
| In situ Component | 0.95 | 0.99 | 0.94 | 0.97 |
| In situ Type | 0.97 | 0.99 | 0.97 | 0.98 |
| In situ Nuclear Grade | 0.96 | 0.98 | 0.96 | 0.97 |
| Necrosis | 0.96 | 0.96 | 0.96 | 0.96 |
| DCIS Extent | 0.98 | 0.97 | 0.95 | 0.96 |
| Architectural Patterns | 0.96 | 0.93 | 0.95 | 0.94 |
| Invasive Carcinoma Margins | 0.96 | 0.97 | 0.98 | 0.97 |
| Distance from Closest Margin | 0.97 | 0.99 | 0.96 | 0.97 |
| Closest Margin | 0.97 | 1.00 | 0.96 | 0.98 |
| DCIS Margins | 0.94 | 0.94 | 0.97 | 0.95 |
| Distance of DCIS from Closest Margin (mm) | 0.95 | 0.99 | 0.94 | 0.96 |
| Closest Margin DCIS | 0.97 | 1.00 | 0.94 | 0.97 |
| Total LN Examined | 0.98 | 1.00 | 0.98 | 0.99 |
| # Sentinel LN Examined | 0.98 | 1.00 | 0.98 | 0.99 |
| Micro/macro metastasis | 0.98 | 1.00 | 0.98 | 0.99 |
| # LN with Micro-metastasis | 0.98 | 1.00 | 0.96 | 0.98 |
| # LN with Macro-metastasis | 0.98 | 1.00 | 0.96 | 0.98 |
| Size of largest Macro-metastasis Deposit | 0.98 | 1.00 | 0.95 | 0.98 |
| Extranodal Extension | 0.98 | 1.00 | 0.94 | 0.97 |
| Extent (mm) | 0.98 | 1.00 | 0.90 | 0.95 |
| Invasive Tumour Size (mm) | 0.97 | 1.00 | 0.97 | 0.98 |
| # Sentinel Nodes Examined | 0.96 | 0.96 | 0.96 | 0.96 |
| # Micro-metastatic Nodes | 0.98 | 1.00 | 0.95 | 0.98 |
| # Macro-metastatic Nodes | 0.97 | 1.00 | 0.93 | 0.96 |
| Pathologic Stage | 0.98 | 1.00 | 0.98 | 0.99 |
Fig. 2a NLP versus the second human reviewer as compared to the GT in operative reports. b NLP versus the second human reviewer as compared to the GT in pathology reports