Literature DB >> 33817057

Persian sentiment analysis of an online store independent of pre-processing using convolutional neural network with fastText embeddings.

Sajjad Shumaly1, Mohsen Yazdinejad2, Yanhui Guo3.   

Abstract

Sentiment analysis plays a key role in companies, especially stores, and increasing the accuracy in determining customers' opinions about products assists to maintain their competitive conditions. We intend to analyze the users' opinions on the website of the most immense online store in Iran; Digikala. However, the Persian language is unstructured which makes the pre-processing stage very difficult and it is the main problem of sentiment analysis in Persian. What exacerbates this problem is the lack of available libraries for Persian pre-processing, while most libraries focus on English. To tackle this, approximately 3 million reviews were gathered in Persian from the Digikala website using web-mining techniques, and the fastText method was used to create a word embedding. It was assumed that this would dramatically cut down on the need for text pre-processing through the skip-gram method considering the position of the words in the sentence and the words' relations to each other. Another word embedding has been created using the TF-IDF in parallel with fastText to compare their performance. In addition, the results of the Convolutional Neural Network (CNN), BiLSTM, Logistic Regression, and Naïve Bayes models have been compared. As a significant result, we obtained 0.996 AUC and 0.956 F-score using fastText and CNN. In this article, not only has it been demonstrated to what extent it is possible to be independent of pre-processing but also the accuracy obtained is better than other researches done in Persian. Avoiding complex text preprocessing is also important for other languages since most text preprocessing algorithms have been developed for English and cannot be used for other languages. The created word embedding due to its high accuracy and independence of pre-processing has other applications in Persian besides sentiment analysis.
© 2021 Shumaly et al.

Entities:  

Keywords:  Convolutional neural network; FastText; Natural language processing; Pseudo labeling; Sentiment analysis; Skip gram; Text mining; Web mining; Web scrapping; Word embedding

Year:  2021        PMID: 33817057      PMCID: PMC7959661          DOI: 10.7717/peerj-cs.422

Source DB:  PubMed          Journal:  PeerJ Comput Sci        ISSN: 2376-5992


  7 in total

1.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures.

Authors:  Alex Graves; Jürgen Schmidhuber
Journal:  Neural Netw       Date:  2005 Jun-Jul

2.  Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and F-measure in a manual and computerized annotation trial.

Authors:  Sumithra Velupillai; Hercules Dalianis; Martin Hassel; Gunnar H Nilsson
Journal:  Int J Med Inform       Date:  2009-05-23       Impact factor: 4.046

3.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

4.  LSTM: A Search Space Odyssey.

Authors:  Klaus Greff; Rupesh K Srivastava; Jan Koutnik; Bas R Steunebrink; Jurgen Schmidhuber
Journal:  IEEE Trans Neural Netw Learn Syst       Date:  2016-07-08       Impact factor: 10.451

5.  A novel wavelet sequence based on deep bidirectional LSTM network model for ECG signal classification.

Authors:  Özal Yildirim
Journal:  Comput Biol Med       Date:  2018-03-28       Impact factor: 4.589

6.  Machine Learning, Sentiment Analysis, and Tweets: An Examination of Alzheimer's Disease Stigma on Twitter.

Authors:  Nels Oscar; Pamela A Fox; Racheal Croucher; Riana Wernick; Jessica Keune; Karen Hooker
Journal:  J Gerontol B Psychol Sci Soc Sci       Date:  2017-09-01       Impact factor: 4.077

7.  Pre-trained convolutional neural networks as feature extractors toward improved malaria parasite detection in thin blood smear images.

Authors:  Sivaramakrishnan Rajaraman; Sameer K Antani; Mahdieh Poostchi; Kamolrat Silamut; Md A Hossain; Richard J Maude; Stefan Jaeger; George R Thoma
Journal:  PeerJ       Date:  2018-04-16       Impact factor: 2.984

  7 in total
  1 in total

1.  Psychological Education Health Assessment Problems Based on Improved Constructive Neural Network.

Authors:  Yang Li; Jia Ze Li; Qi Fan; Xin Li; Zhihong Wang
Journal:  Front Psychol       Date:  2022-08-02
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.