| Literature DB >> 34604711 |
Briton Park1, Nicholas Altieri1, John DeNero2, Anobel Y Odisho3,4,5, Bin Yu1,2,6.
Abstract
OBJECTIVE: We develop natural language processing (NLP) methods capable of accurately classifying tumor attributes from pathology reports given minimal labeled examples. Our hierarchical cancer to cancer transfer (HCTC) and zero-shot string similarity (ZSS) methods are designed to exploit shared information between cancers and auxiliary class features, respectively, to boost performance using enriched annotations which give both location-based information and document level labels for each pathology report.Entities:
Keywords: cancer; natural language processing; pathology
Year: 2021 PMID: 34604711 PMCID: PMC8484934 DOI: 10.1093/jamiaopen/ooab085
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
Average micro-f1 and macro-f1 performance as a function of 10, 20, and 40 labeled examples on colon, kidney, and lung cancer pathology reports
| Macro-F1 | Micro-f1 | |||||
|---|---|---|---|---|---|---|
| In-domain training sizes | 10 | 20 | 40 | 10 | 20 | 40 |
| Hierarchical attention network | 0.298 | 0.287 | 0.355 | 0.580 | 0.574 | 0.718 |
| Logistic | 0.344 (0.055) | 0.441 (0.059) | 0.467 (0.073) | 0.634 (0.039) | 0.676 (0.037) | 0.708 (0.047) |
| Random forest | 0.276 (0.025) | 0.307 (0.034) | 0.340 (0.044) | 0.586 (0.039) | 0.614 (0.030) | 0.641 (0.036) |
| SVM | 0.221 (0.048) | 0.269 (0.034) | 0.310 (0.034) | 0.519 (0.102) | 0.560 (0.051) | 0.570 (0.052) |
| Boost | 0.436 (0.036) | 0.468 (0.044) | 0.548 (0.052) | 0.704 (0.049) | 0.732 (0.037) | 0.789 (0.038) |
| SLA | 0.211 (0.024) | 0.338 (0.037) | 0.466 (0.043) | 0.579 (0.031) | 0.700 (0.029) | 0.790 (0.026) |
| HCTC | 0.461 (0.038) | 0.508 (0.034) | 0.544 (0.028) | 0.797 (0.023) | 0.832 (0.022) | 0.858 (0.018) |
| HCTC-final | 0.421 (0.034) | 0.502 (0.047) | 0.584 (0.048) | 0.776 (0.027) | 0.842 (0.030) | 0.882 (0.024) |
| HCTC-line | 0.205 (0.013) | 0.341 (0.035) | 0.473 (0.040) | 0.579 (0.044) | 0.700 (0.041) | 0.800 (0.025) |
Note: The results presented include the mean performance and standard deviation across 10 random splits of the data for the shared labels case.
Methods marked with are trained on 8, 17, and 33 reports to adjust for annotation time. Note due to computational reasons we only run HAN once for all experiments.
Average micro-f1 and macro-f1 performance across a function of 10, 20, and 40 labeled examples on colon, kidney, and lung cancer pathology reports
| Macro-f1 | Micro-f1 | |||||
|---|---|---|---|---|---|---|
| In-domain training sizes | 10 | 20 | 40 | 10 | 20 | 40 |
| Hierarchical attention network | 0.051 | 0.026 | 0.079 | 0.255 | 0.118 | 0.264 |
| Logistic | 0.208 (0.029) | 0.277 (0.047) | 0.354 (0.048) | 0.473 (0.033) | 0.578 (0.053) | 0.651 (0.033) |
| Random forest | 0.177 (0.045) | 0.223 (0.024) | 0.323 (0.043) | 0.438 (0.044) | 0.516 (0.042) | 0.618 (0.028) |
| SVM | 0.152 (0.029) | 0.172 (0.031) | 0.239 (0.030) | 0.387 (0.066) | 0.425 (0.036) | 0.517 (0.044) |
| Boost | 0.155 (0.021) | 0.288 (0.040) | 0.382 (0.034) | 0.421 (0.029) | 0.608 (0.051) | 0.715 (0.028) |
| SLA | 0.095 (0.015) | 0.178 (0.012) | 0.219 (0.016) | 0.472 (0.036) | 0.651 (0.023) | 0.736 (0.016) |
| ZSS | 0.442 (0.024) | 0.436 (0.017) | 0.428 (0.028) | 0.743 (0.016) | 0.737 (0.011) | 0.742 (0.009) |
| ZSS-doc | 0.359 (0.023) | 0.356 (0.022) | 0.341 (0.019) | 0.546 (0.024) | 0.540 (0.021) | 0.528 (0.007) |
| ZSS-thresholding | 0.441 (0.024) | 0.447 (0.022) | 0.449 (0.029) | 0.739 (0.017) | 0.765 (0.018) | 0.780 (0.015) |
| Oracle | 0.454 (0.031) | 0.501 (0.029) | 0.529 (0.024) | 0.775 (0.019) | 0.829 (0.017) | 0.862 (0.011) |
Note: The results presented include the mean performance and standard deviation across 10 random splits of the data for the unique labels case.
Methods marked with are trained on 8, 17, and 33 reports to adjust for annotation time. Note due to computational reasons we only run HAN once for all experiments.
Figure 1.Average macro-f1 (A) and micro-f1 (B) performance for test instances where the label is not seen during training as a function of 10, 20, and 40 labeled examples on colon, kidney, and lung cancer pathology reports. The results presented include the mean performance using ZSS across 10 random splits of the data and 95% confidence intervals for the unique labels case. Note that the number of zero-shot test instances decreases as the number of training instances increase.
Figure 2.Average macro-f1 (A) and micro-f1 (B) performance for test instances where the label is not seen during training as a function of 10, 20, and 40 labeled examples on colon, kidney, and lung cancer pathology reports. The results presented include the mean performance using ZSS-thresholding across 10 random splits of the data and 95% confidence intervals for the unique labels case. Note that the number of zero-shot test instances decreases as the number of training instances increase.