| Literature DB >> 32565772 |
Atif Khan1, Muhammad Adnan Gul1, Mahdi Zareei2, R R Biswal2, Asim Zeb3, Muhammad Naeem3, Yousaf Saeed4, Naomie Salim5.
Abstract
With the growing information on web, online movie review is becoming a significant information resource for Internet users. However, online users post thousands of movie reviews on daily basis and it is hard for them to manually summarize the reviews. Movie review mining and summarization is one of the challenging tasks in natural language processing. Therefore, an automatic approach is desirable to summarize the lengthy movie reviews, and it will allow users to quickly recognize the positive and negative aspects of a movie. This study employs a feature extraction technique called bag of words (BoW) to extract features from movie reviews and represent the reviews as a vector space model or feature vector. The next phase uses Naïve Bayes machine learning algorithm to classify the movie reviews (represented as feature vector) into positive and negative. Next, an undirected weighted graph is constructed from the pairwise semantic similarities between classified review sentences in such a way that the graph nodes represent review sentences, while the edges of graph indicate semantic similarity weight. The weighted graph-based ranking algorithm (WGRA) is applied to compute the rank score for each review sentence in the graph. Finally, the top ranked sentences (graph nodes) are chosen based on highest rank scores to produce the extractive summary. Experimental results reveal that the proposed approach is superior to other state-of-the-art approaches.Entities:
Mesh:
Year: 2020 PMID: 32565772 PMCID: PMC7288188 DOI: 10.1155/2020/7526580
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Proposed approach for movie reviews classification and summarization.
BoW vector space model for unigrams.
| Review documents | Acting | Good | Great | Hated | Loved | Movie | This | Class |
|---|---|---|---|---|---|---|---|---|
| Review Doc1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | +ve |
| Review Doc2 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | −ve |
| Review Doc3 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | +ve |
Bag of bigram vector space model.
| Review documents | Acting good | Good movie | Great acting | Hated this | Loved this | This movie | Class |
|---|---|---|---|---|---|---|---|
| Review Doc1 | 0 | 0 | 0 | 0 | 1 | 1 | +ve |
| Review Doc2 | 0 | 0 | 0 | 1 | 0 | 1 | −ve |
| Review Doc3 | 1 | 1 | 1 | 0 | 0 | 0 | +ve |
Vector space model of bag of unigrams and bigrams.
| Review documents | Acting | Acting good | Good | Good movie | … | Loved this | Movie | This | This movie | Class |
|
| ||||||||||
| Review Doc1 | 0 | 0 | 0 | 0 | … | 1 | 1 | 1 | 1 | +ve |
| Review Doc2 | 0 | 0 | 0 | 0 | … | 0 | 1 | 1 | 1 | −ve |
| Review Doc3 | 1 | 1 | 1 | 1 | … | 0 | 1 | 0 | 0 | +ve |
Figure 2Undirected weighted graph.
Movie review classification accuracy on three tasks.
| Features | PL04 | Full IMDB | Subjectivity | |
|---|---|---|---|---|
| 1 | Unigrams with NB | 81.5 | 86.66 | 90.75 |
| 2 | Bigrams with NB | 77.7 | 88.29 | 76.03 |
| 3 | Unigrams + bigrams with NB | 82.4 | 88.91 |
|
| 4 | Unigram frequency + smoothed IDF + cosine normalization | 82.1 | 87.36 | 90.7 |
| 5 | Bigram frequency + smoothed IDF + cosine normalization | 81.15 | 88.31 | 76.72 |
| 6 | Unigrams + bigrams + smoothed IDF + cosine normalization |
|
| 90.91 |
| 10 | Benchmark model [ | 88.90 | 88.89 | 88.13 |
PL04 refers to the collection of 2000 movie reviews often used as benchmark dataset for sentiment classification [61], Full IMDB dataset is a collection of 50,000 reviews, and sentence subjectivity dataset is a collection of 1000 movie reviews [61].
Comparison of the proposed summarization technique with other summarization models based on different measures obtained with ROUGE-1.
| Techniques | Average precision | Average recall | Average F-measure |
|---|---|---|---|
| Proposed technique |
|
|
|
| LexRank [ | 0.39215 | 0.3997 | 0.3959 |
| TextRank [ | 0.24515 | 0.25535 | 0.25015 |
Comparison of the proposed summarization technique with other summarization models based on different measures obtained with ROUGE-2.
| Techniques | Average precision | Average recall | Average F-measure |
|---|---|---|---|
| Proposed technique |
|
|
|
| LexRank [ | 0.30195 | 0.30805 | 0.305 |
| TextRank [ | 0.13595 | 0.14215 | 0.139 |
Figure 3Comparison of summarization models in terms of ROUGE-1 measures.
Figure 4Comparison of summarization models in terms of ROUGE-2 measures.