| Literature DB >> 31532478 |
Isabel Segura-Bedmar1, Pablo Raez1.
Abstract
OBJECTIVE: The goal of the 2018 n2c2 shared task on cohort selection for clinical trials (track 1) is to identify which patients meet the selection criteria for clinical trials. Cohort selection is a particularly demanding task to which natural language processing and deep learning can make a valuable contribution. Our goal is to evaluate several deep learning architectures to deal with this task.Entities:
Keywords: cohort selection; convolutional neural network; deep learning; multilabel text classification; recurrent neural network
Mesh:
Year: 2019 PMID: 31532478 PMCID: PMC6798560 DOI: 10.1093/jamia/ocz139
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1.Criteria distribution.
Figure 2.Convolutional neural network-recurrent neural network hybrid architecture. FFF: fully connected feedforward.
Summary of best parameters for our deep learning models
| Method | Hyperparameters |
|---|---|
| CNN |
Number of filters: 128 Filter size: 10 Dropout: 0.1 Batch size: 32 |
| Deep CNN |
Numbers of filter for the 3 convolutional blocks: 64, 128, 256 Filter size: 3 Dropout: 0.46 Batch size: 64 |
| RNN (GRU) |
Dropout: 0.5 Batch size: 128 |
| CNN+RNN |
Random initialization Two convolutions with 128 filters. Filter size: 5 Dropout: 0.5 Batch size: 128 |
CNN: convolutional neural network; FFF: fully connected feedforward; GRU: gated recurrent unit; RNN: recurrent neural network.
F1 scores for the simple and deep CNN architectures.
| CNN | Deep CNN | |||||||
|---|---|---|---|---|---|---|---|---|
| Without FFF | With FFF | Without FFF | With FFF | |||||
| Random | Pretrained | Random | Pretrained | Random | Pretrained | Random | Pretrained | |
| Abdominal | 0.5486 | 0.5764 | 0.5764 | 0.4444 | 0.3617 | 0.4886 | 0.3878 | 0.3878 |
| Advanced-cad | 0.3844 | 0.3182 | 0.6411 | 0.3333 | 0.4223 | 0.3182 | 0.3478 | 0.3478 |
| Alcohol-abuse | 0.4915 | 0.4915 | 0.4915 | 0.4915 | 0.4915 | 0.4915 | 0.4915 | 0.4915 |
| Asp-for-mi | 0.434 | 0.434 | 0.434 | 0.434 | 0.434 | 0.434 | 0.434 | 0.434 |
| Creatinine | 0.5111 | 0.5238 | 0.5111 | 0.4976 | 0.3878 | 0.4118 | 0.4118 | 0.4118 |
| Dietsupp-2mos | 0.4643 | 0.3973 | 0.4258 | 0.4327 | 0.457 | 0.4643 | 0.4712 | 0.4667 |
| Drug-abuse | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 |
| English | 0.4737 | 0.4737 | 0.4737 | 0.4737 | 0.4737 | 0.4737 | 0.4737 | 0.4737 |
| Hba1c | 0.5581 | 0.4991 | 0.4 | 0.5342 | 0.4 | 0.4 | 0.4 | 0.4 |
| Keto-1yr | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 |
| Major-diabetes | 0.4505 | 0.5694 | 0.55 | 0.4444 | 0.4171 | 0.4976 | 0.3478 | 0.3478 |
| Makes-decisions | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 |
| Mi-6mos | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 |
| Overall(micro) | 0.7527 | 0.7515 | 0.7721 | 0.7372 | 0.7347 | 0.7514 | 0.7611 | 0.76 |
| Overall(macro) | 0.4819 | 0.4794 | 0.4963 | 0.4642 | 0.4456 | 0.456 | 0.4395 | 0.4392 |
Random means that random initialization was used. Pretrained means that pretrained word embeddings were used.
CNN: convolutional neural network; FFF: fully connected feedforward.
Best score.
F1 scores for the RNN and hybrid architectures.
| RNN | CNN+RNN | |||||||
|---|---|---|---|---|---|---|---|---|
| Without FFF | With FFF | Without FFF | With FFF | |||||
| Random | Pretrained | Random | Pretrained | Random | Pretrained | Random | Pretrained | |
| Abdominal | 0.3878 | 0.3878 | 0.4792 | 0.3878 | 0.3878 | 0.4792 | 0.3878 | 0.3878 |
| Advanced-cad | 0.3478 | 0.4222 | 0.3478 | 0.3478 | 0.4994 | 0.3478 | 0.4034 | 0.3844 |
| Alcohol-abuse | 0.4915 | 0.4915 | 0.4915 | 0.4915 | 0.4915 | 0.4915 | 0.4915 | 0.4915 |
| Asp-for-mi | 0.434 | 0.434 | 0.434 | 0.434 | 0.434 | 0.434 | 0.434 | 0.5238 |
| Creatinine | 0.5249 | 0.4118 | 0.6104 | 0.4118 | 0.6889 | 0.5581 | 0.5833 | 0.5833 |
| Dietsupp-2mos | 0.4886 | 0.4857 | 0.6296 | 0.7333 | 0.5662 | 0.6703 | 0.5982 | 0.4976 |
| Drug-abuse | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 |
| English | 0.4737 | 0.4737 | 0.4737 | 0.4737 | 0.4737 | 0.4737 | 0.4737 | 0.4737 |
| Hba1c | 0.4 | 0.4 | 0.4 | 0.4 | 0.4792 | 0.4 | 0.4 | 0.4 |
| Keto-1yr | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 |
| Major-diabetes | 0.3478 | 0.5543 | 0.4665 | 0.3478 | 0.6122 | 0.4665 | 0.6296 | 0.4994 |
| Makes-decisions | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 |
| Mi-6mos | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 | 0.4828 |
| Overall(micro) | 0.7623 | 0.7639 | 0.7843 | 0.7811 | 0.7827 | 0.7856 | 0.779 | 0.7233 |
| Overall(macro) | 0.4496 | 0.4622 | 0.4831 | 0.4597 | 0.5062 | 0.4823 | 0.4884 | 0.4761 |
Random means that random initialization was used. Pretrained means that pretrained word embeddings were used.
CNN: convolutional neural network; FFF: fully connected feedforward; RNN: recurrent neural network.
Best score.
P-values for all the deep learning models
| CNN + pretrained | CNN + random + FFF | CNN + pretrained + FFF | DeepCNN + random | DeepCNN + pretrained | DeepCNN + random + FFF | DeepCNN + pretrained + FFF | RNN + random | RNN + pretrained | RNN + random + FFF | RNN + pretrained + FFF | Hybrid + random | Hybrid + pre | Hybrid + random + FFF | Hybrid + pretrained + FFF | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CNN + random | .18924 | .07019 | .25179 | .03824 | .02272 | .34299 | .03767 | .7925 | .26414 | .68716 | .83793 | .53839 | .5492 | .2787 | .72477 |
| CNN + pretrained | .09341 | .06587 | .16563 | .09075 | .55969 | .43064 | .33058 | .43149 | .12645 | .08502 | .1452 | .26052 | .15295 | .63016 | |
| CNN + random + FFF | .30045 | .08186 | .17892 | .85835 | .48875 | .34678 | .47327 | .36786 | .19774 | .55937 | .29144 | .44913 | .90165 | ||
| CNN + pretrained + FFF | .45383 | .24682 | .59507 | .25383 | .62239 | .76786 | .57572 | .4946 | .16991 | .45395 | .07955 | .40593 | |||
| DeepCNN + random | .03394 | .74826 | .38392 | .7515 | .68382 | .85001 | .77073 | .67344 | .4473 | .74103 | .68865 | ||||
| DeepCNN + pretrained | .2581 | .02051 | .60709 | .47757 | .23961 | .29475 | .73865 | .85756 | .58568 | .69499 | |||||
| DeepCNN + random + FFF | .02415 | .30862 | .16687 | .23987 | .5289 | .33919 | .27965 | .43141 | .38983 | ||||||
| DeepCNN + pretrained + FFF | .73404 | .22673 | .34101 | .37431 | .41331 | .52991 | .26371 | .48985 | |||||||
| RNN + random | .14826 | .04582 | .22634 | .09698 | .39048 | .22868 | .23598 | ||||||||
| RNN + pretrained | .17518 | .39877 | .22112 | .33279 | .12925 | .21353 | |||||||||
| RNN + random + FFF | .03359 | .46653 | .444 | .22266 | .26722 | ||||||||||
| RNN + pretrained + FFF | .50063 | .27616 | .50124 | .28188 | |||||||||||
| Hybrid + random | .15912 | .03474 | .06604 | ||||||||||||
| Hybrid + pre | .08122 | .1824 | |||||||||||||
| Hybrid + random + FFF | .13658 |
CNN: convolutional neural network; FFF: fully connected feedforward; RNN: recurrent neural network.
Statistically different at a level of significance of 0.05.