| Literature DB >> 31467518 |
Abstract
We explored several approaches to incorporate context information in the deep learning framework for text classification, including designing different attention mechanisms based on different neural network and extracting some additional features from text by traditional methods as the part of representation. We propose two kinds of classification algorithms: one is based on convolutional neural network fusing context information and the other is based on bidirectional long and short time memory network. We integrate the context information into the final feature representation by designing attention structures at sentence level and word level, which increases the diversity of feature information. Our experimental results on two datasets validate the advantages of the two models in terms of time efficiency and accuracy compared to the different models with fundamental AM architectures.Entities:
Mesh:
Year: 2019 PMID: 31467518 PMCID: PMC6701294 DOI: 10.1155/2019/8320316
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1The two basic models in our experiments. One is based on CNN and the other is based on BLSTM. (a) The basic model based on CNN. The width of convolution filters is 3, and the method of pooling is max pooling. (b) The basic model based on LSTM. The output of LSTM network is a matrix concatenating the hidden vectors from forward LSTM and backward LSTM.
Figure 2Text classification model with attention mechanism and additional features based on CNNs and consecutive sentences.
Figure 3Text classification model with attention mechanism and additional features based on LSTM and consecutive sentences. (a) When the inputs are sequential sentences, the overall process of the text classification model is based on LSTM. The length of the sequential sentences is 3. (b) The detailed implementation of the AM module.
Dataset summary statistics.
| Dataset |
|
| Train (k) | Validation (k) | Test (k) |
|---|---|---|---|---|---|
| SWDA | 43 | 20 | 193 | 23 | 5 |
| MRDA | 5 | 12 | 78 | 16 | 15 |
C is the number of classes; V is the vocabulary size. The train/validation/test indicates the number of utterances in all dialogs.
Experiment values and choices of some hyperparameters.
| Hyperparameter | Choice | Experiment values |
|---|---|---|
| Learning rate | 0.01 | 0.1, 0.01, 0.001 |
| LSTM output dimension | 100 | 50, 100, 150 |
| LSTM direction | Bidir | Unidir, bidir |
| LSTM pooling | Last | Mean, last |
| CNN filter numbers | 100 | 100, 300, 500, 700 |
| CNN filter height | 3 | 1, 2, 3, 4, 5 |
| Word vector dimension | 200 | 100, 200, 300 |
Classification accuracy (%) and epoch time (s) when using the basic models based on CNN and BLSTM.
| Models | Accuracy | Time |
|---|---|---|
| CNN with no pretrained embeddings | 68.9/75.4 | 121/100 |
| CNN with pretrained embeddings |
| 110/89 |
| BLSTM with no pretrained embeddings | 68.2/75.0 | 587/450 |
| BLSTM with pretrained embeddings | 70.6/77.1 | 571/423 |
Results of models with different context process methods.
| Models | Accuracy | Time |
|---|---|---|
| CNN with contextual info | 72.3/81.8 | 310/260 |
| CNN with contextual info and AM | 72.6/82.4 | 346/287 |
| LSTM with contextual info | 72.5/82.1 | 923/700 |
| LSTM with contextual info and AM |
| 1050/794 |
Results of models when using additional features by different ways.
| Models | Accuracy | Time |
|---|---|---|
| Attended representation with length + TF-IDF (CNN) | 72.7/83.1 | 379/310 |
| Attended representation with probs (CNN) | 72.8/83.0 | 396/321 |
| Attended representation with probs + length + TF-IDF (CNN) | 73.0/83.4 | 412/335 |
| Attended representation with length + TF-IDF (BLSTM) | 73.7/85.6 | 1084/905 |
| Attended representation with probs (BLSTM) | 73.6/85.7 | 1107/918 |
| Attended representation with probs + length + TF-IDF (BLSTM) |
| 1141/924 |
Results of our models and other methods from the literature: HMM [16]; CNN-FF and LSTM-FF [36]; CA-LSTM: contextual attentive LSTM [40]; CR-attention: CNN + RNN + attention [38].
| Models | Accuracy | Time |
|---|---|---|
| Our best model based on CNN | 73.0/83.4 | 412/335 |
| Our best model based on LSTM |
| 1141/924 |
| CR-attention | 73.8/84.3 | 1320/1182 |
| CNN + FF | 73.1/84.6 | 732/620 |
| LSTM + FF | 69.6/84.3 | 940/762 |
| HMM | 71.0/— | 210/— |
| CA-LSTM | 72.6/— | 1026/— |