| Literature DB >> 31710668 |
Mohammed Alawad1, Shang Gao1, John X Qiu1, Hong Jun Yoon1, J Blair Christian1, Lynne Penberthy2, Brent Mumphrey3, Xiao-Cheng Wu3, Linda Coyle4, Georgia Tourassi1.
Abstract
OBJECTIVE: We implement 2 different multitask learning (MTL) techniques, hard parameter sharing and cross-stitch, to train a word-level convolutional neural network (CNN) specifically designed for automatic extraction of cancer data from unstructured text in pathology reports. We show the importance of learning related information extraction (IE) tasks leveraging shared representations across the tasks to achieve state-of-the-art performance in classification accuracy and computational efficiency.Entities:
Keywords: cancer pathology reports; convolutional neural network; deep learning; information extraction; multitask learning; natural language processing
Mesh:
Year: 2020 PMID: 31710668 PMCID: PMC7489089 DOI: 10.1093/jamia/ocz153
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1.Louisiana Tumor Registry (LA) data preparation flow chart.
Figure 2.The number of occurrences per label of all cancer characteristics.
Figure 3.Architecture diagram of the hard parameter sharing multitask convolutional neural network model. Colors differentiate convolution layers, in which each set of filters uses a different filter size.
Figure 4.Architecture diagram of the cross-stitch multitask convolutional neural network model.
Retrospective evaluation performance (with 95% confidence interval) of classification models on each classification task
| Classifier | micro F | macro F | Precision | Recall |
|---|---|---|---|---|
| Cancer primary site – 65 classes | ||||
| Traditional machine learning classifiers | ||||
| Support vector machine | 0.857 (0.854-0.860) | 0.390 (0.382-0.398) | 0.475 (0.460-0.492) | 0.364 (0.358-0.371) |
| Random forest classifier | 0.886 (0.883-0.888) | 0.392 (0.385-0.399) | 0.494 (0.447-0.505) | 0.382 (0.377-0.388) |
| Deep learning classifiers | ||||
| Single-task CNN | 0.915 (0.913-0.917) | 0.491 (0.481-0.500) | 0.611 (0.578-0.628) | 0.472 (0.464-0.479) |
| Multitask CNN cross-stitch | 0.944 (0.942-0.946) | 0.592 (0.582-0.602) | 0.678 (0.653-0.700) | 0.573 (0.565-0.583) |
| Multitask CNN hard parameter sharing | 0.941 (0.939-0.943) | 0.575 (0.565-0.586) | 0.652 (0.621-0.666) | 0.560 (0.553-0.572) |
| Laterality – 4 classes | ||||
| Traditional machine learning classifiers | ||||
| Support vector machine | 0.887 (0.884-0.890) | 0.714 (0.706-0.722) | 0.792 (0.775-0.808) | 0.692 (0.687-0.697) |
| Random forest classifier | 0.910 (0.908-0.912) | 0.770 (0.761-0.778) | 0.805 (0.794-0.816) | 0.749 (0.741-0.757) |
| Deep learning classifiers | ||||
| Single-task CNN | 0.921 (0.919-0.923) | 0.758 (0.750-0.767) | 0.831 (0.816-0.846) | 0.736 (0.730-0.743) |
| Multitask CNN cross-stitch | 0.930 (0.928-0.932) | 0.812 (0.804-0.820) | 0.830 (0.820-0.840) | 0.799 (0.791-0.807) |
| Multitask CNN hard parameter sharing | 0.933 (0.931-0.935) | 0.822 (0.814-0.831) | 0.848 (0.838-0.858) | 0.804 (0.796-0.813) |
| Behavior – 3 classes | ||||
| Traditional machine learning classifiers | ||||
| Support vector machine | 0.935 (0.933-0.937) | 0.845 (0.839-0.851) | 0.886 (0.879-0.892) | 0.812 (0.804-0.820) |
| Random forest classifier | 0.945 (0.943-0.947) | 0.842 (0.835-0.848) | 0.908 (0.902-0.915) | 0.793 (0.784-0.801) |
| Deep learning classifiers | ||||
| Single-task CNN | 0.958 (0.956-0.959) | 0.911 (0.907-0.915) | 0.943 (0.939-0.946) | 0.883 (0.877-0.889) |
| Multitask CNN cross-stitch | 0.973 (0.972-0.974) | 0.946 (0.943-0.950) | 0.951 (0.947-0.954) | 0.942 (0.938-0.947) |
| Multitask CNN hard parameter sharing | 0.975 (0.973-0.976) | 0.952 (0.949-0.955) | 0.954 (0.950-0.958) | 0.950 (0.946-0.954) |
| Histological type – 63 classes | ||||
| Traditional machine learning classifiers | ||||
| Support vector machine | 0.664 (0.660-0.667) | 0.298 (0.292-0.304) | 0.457 (0.426-0.475) | 0.268 (0.264-0.273) |
| Random forest classifier | 0.722 (0.719-0.726) | 0.373 (0.366-0.378) | 0.565 (0.530-0.594) | 0.344 (0.339-0.349) |
| Deep learning classifiers | ||||
| Single-task CNN | 0.776 (0.773-0.779) | 0.540 (0.532-0.547) | 0.688 (0.675-0.700) | 0.510 (0.503-0.516) |
| Multitask CNN cross-stitch | 0.811 (0.808-0.814) | 0.650 (0.643-0.656) | 0.730 (0.720-0.741) | 0.623 (0.617-0.630) |
| Multitask CNN hard parameter sharing | 0.811 (0.807-0.814) | 0.656 (0.649-0.662) | 0.750 (0.704-0.724) | 0.621 (0.633-0.646) |
| Histological grade – 5 classes | ||||
| Traditional machine learning classifiers | ||||
| Support vector machine | 0.659 (0.655-0.663) | 0.592 (0.586-0.597) | 0.664 (0.657-0.671) | 0.563 (0.559-0.569) |
| Random forest classifier | 0.754 (0.751-0.758) | 0.699 (0.694-0.704) | 0.729 (0.723-0.734) | 0.680 (0.675-0.685) |
| Deep learning classifiers | ||||
| Single-task CNN | 0.797 (0.794-0.800) | 0.754 (0.749-0.759) | 0.775 (0.770-0.780) | 0.738 (0.734-0.743) |
| Multitask CNN cross-stitch | 0.796 (0.792-0.799) | 0.753 (0.748-0.758) | 0.768 (0.763-0.773) | 0.742 (0.737-0.747) |
| Multitask CNN hard parameter sharing | 0.802 (0.799-0.806) | 0.766 (0.761-0.770) | 0.771 (0.767,0.777) | 0.761 (0.756,0.766) |
Support vector machine hyper-parameters: (C = 4.0, kernel= linear). Random forest classifier hyperparameters: (num trees = 500, max features = 0.6).
CNN: convolutional neural network.
aBest-performing classifier.
bStatistically significant difference between a multitask learning model and all baseline models.
Figure 5.Prospective evaluation micro- and macro-averaged F scores comparing the multitask convolutional neural network (MTCNN) models and the baseline models. CS: cross-stitch; HS: hard parameter sharing; RFC: random forest classifier; SVM: support vector machine.
Figure 6.Comparing the multitask convolutional neural network (MTCNN) models and the baseline models in terms of number of correctly classified tasks for each document: (A) retrospective evaluation (B) prospective evaluation. CS: cross-stitch; HS: hard parameter sharing; RFC: random forest classifier; SVM: support vector machine.
Summary of training time and number of trainable parameters for deep CNN–based models.
| Model | Trainable parameters | Training time |
|---|---|---|
| Single-task CNN, for cancer primary site | 10 355 465 | 1 h 50 min |
| Single-task CNN, for laterality | 10 300 504 | 2 h 35 min |
| Single-task CNN, for behavior | 10 299 603 | 2 h 50 min |
| Single-task CNN, for histological type | 10 353 663 | 1 h 50 min |
| Single-task CNN, for histological grade | 10 301 405 | 2 h 25 min |
| Multitask CNN, cross-stitch | 14 746 715 | 12 h |
| Multitask CNN, hard parameter sharing | 10 423 040 | 2 h |
For each model, 9 216 000 parameters are associated with the word embeddings.
CNN: convolutional neural network.