| Literature DB >> 29155996 |
Shang Gao1, Michael T Young1, John X Qiu1, Hong-Jun Yoon1, James B Christian1, Paul A Fearn2, Georgia D Tourassi1, Arvind Ramanthan1.
Abstract
OBJECTIVE: We explored how a deep learning (DL) approach based on hierarchical attention networks (HANs) can improve model performance for multiple information extraction tasks from unstructured cancer pathology reports compared to conventional methods that do not sufficiently capture syntactic and semantic contexts from free-text documents.Entities:
Keywords: attention networks; classification; clinical pathology reports; information retrieval; recurrent neural nets
Year: 2018 PMID: 29155996 PMCID: PMC7282502 DOI: 10.1093/jamia/ocx131
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Distribution of labeled reports for the primary site classification task and the histological grade classification task
| Primary site ICD-O-3 Topographical codes | ||
|---|---|---|
Code | Count | Description |
| C34.0 | 26 | Main bronchus |
| C34.1 | 139 | Upper lobe, lung |
| C34.2 | 11 | Middle lobe, lung |
| C34.3 | 78 | Lower lobe, lung |
| C34.9 | 191 | Lung, NOS |
| C50.1 | 13 | Central portion of breast |
| C50.2 | 36 | Upper-inner quadrant of breast |
| C50.3 | 10 | Lower-inner quadrant of breast |
| C50.4 | 63 | Upper-outer quadrant of breast |
| C50.5 | 21 | Lower-outer quadrant of breast |
| C50.8 | 62 | Overlapping lesion of breast |
C50.9 | 292 | Breast NOS |
Histological grades | ||
| G1 | 124 | Well differentiated (low grade) |
| G2 | 233 | Moderately differentiated (intermediate grade) |
| G3 | 271 | Poorly differentiated (high grade) |
| G4 | 17 | Undifferentiated (high grade) |
NOS, not otherwise specified.
Figure 1.Architecture for our hierarchical attention network (HAN). produces line embeddings by processing the word embeddings in each line. The HAN then produces a document embedding by processing the line embeddings in the document. The final document embedding can then be used for classification or pretraining purposes.
Final test performance of classification models on each classification task
| Traditional Machine Learning Classifiers | ||||
|---|---|---|---|---|
| Classifier | Primary site | Primary site | Histological grade | Histological grade |
| micro | macro | micro | macro | |
| Naive Bayes | 0.554 | 0.161 | 0.481 | 0.264 |
| (0.521, 0.586) | (0.152, 0.170) | (0.442, 0.519) | (0.244, 0.283) | |
| Logistic regression (penalty = l1, solver = liblinear, C = 3.3, multiclass = ovr) | 0.708 | 0.400 | 0.657 | 0.507 |
| (0.678, 0.737) | (0.361, 0.437) | (0.622, 0.693) | (0.433, 0.584) | |
| Support vector machine (C = 25, kernel = sigmoid, gamma = 0.5, shrinking = true) | 0.673 | 0.396 | 0.595 | 0.472 |
| (0.643, 0.702) | (0.353, 0.435) | (0.558, 0.634) | (0.413, 0.540) | |
| Random forest (num trees = 400, max features = 0.9) | 0.701 | 0.437 | 0.694 | 0.579 |
| (0.673, 0.730) | (0.406, 0.467) | (0.657, 0.727) | (0.503, 0.650) | |
XGBoost (max depth = 5, num trees = 300, learning rate = 0.3) | 0.712 | 0.431 | 0.681 | 0.612 |
(0.683, 0.740) | (0.395, 0.466) | (0.643, 0.716) | (0.539, 0.673) | |
Deep Learning Classifiers | ||||
| Convolutional neural network | 0.712 | 0.398 | 0.716 | 0.521 |
| (0.680, 0.736) | (0.359, 0.434) | (0.681, 0.750) | (0.493, 0.548) | |
| Recurrent neural network (without attention mechanism) | 0.617 | 0.327 | 0.393 | 0.275 |
| (0.586, 0.648) | (0.292, 0.363) | (0.353, 0.431) | (0.245, 0.304) | |
| Recurrent neural network (with attention mechanism) | 0.694 | 0.468 | 0.580 | 0.474 |
| (0.666, 0.722) | (0.432, 0.502) | (0.541, 0.617) | (0.416, 0.536) | |
| Hierarchical attention network (no pretraining, word attention only) | 0.695 | 0.405 | 0.473 | 0.341 |
| (0.666, 0.725) | (0.367, 0.443) | (0.437, 0.512) | (0.302, 0.390) | |
| Hierarchical attention network (no pretraining, line attention only) | 0.731 | 0.464 | 0.473 | 0.340 |
| (0.704, 0.760) | (0.425, 0.503) | (0.434, 0.512) | (0.301, 0.388) | |
| Hierarchical attention network (no pretraining, word and line attention) | 0.784 | 0.566 | ||
| (0.759, 0.810) | (0.525, 0.607) | |||
| Hierarchical attention network (with pretraining, word and line attention) | 0.904 | 0.822 | ||
| (0.881, 0.927) | (0.744, 0.883) | |||
Classifier performance and confidence intervals on individual tasks are shown within parentheses.
The bolded values in the respective columns highlight the best performing classifier.
Figure 2.The HAN trains and validates accuracies with and without pretraining during the first 10 epochs for (A) the primary site classification task and (B) the histological grade classification task.
Figure 3.HAN annotations on sample pathology report for each classification task. The most important words in each line are highlighted in blue, with darker blue indicating higher importance. The most important lines in the report are highlighted in red, with darker red indicating higher importance. For each task, the HAN can successfully locate the specific line(s) within a document and text within the line(s) that identify the primary site (eg, lower lobe) or histological grade (eg, poorly differentiated). The RNN structure utilized by the HAN allows it to take into account word and line context to better locate the correct text segments.
Figure 4.HAN document embeddings reduced to 2 dimensions via principal component analysis for (A) primary site train reports, (B) histological grade train reports, (C) primary site test reports, and (D) histological grade test reports.
Figure 5.Confusion matrix for (A) HAN with pretraining on the primary site classification task and (B) HAN without pretraining on the histological grade classification task.