| Literature DB >> 35911484 |
Nabanita Das1, Bikash Sadhukhan1, Tanusree Chatterjee1, Satyajit Chakrabarti2.
Abstract
Forecasting the stock market is one of the most difficult undertakings in the financial industry due to its complex, volatile, noisy, and nonparametric character. However, as computer science advances, an intelligent model can help investors and analysts minimize investment risk. Public opinion on social media and other online portals is an important factor in stock market predictions. The COVID-19 pandemic stimulates online activities since individuals are compelled to remain at home, bringing about a massive quantity of public opinion and emotion. This research focuses on stock market movement prediction with public sentiments using the long short-term memory network (LSTM) during the COVID-19 flare-up. Here, seven different sentiment analysis tools, VADER, logistic regression, Loughran-McDonald, Henry, TextBlob, Linear SVC, and Stanford, are used for sentiment analysis on web scraped data from four online sources: stock-related articles headlines, tweets, financial news from "Economic Times" and Facebook comments. Predictions are made utilizing both feeling scores and authentic stock information for every one of the 28 opinion measures processed. An accuracy of 98.11% is achieved by using linear SVC to calculate sentiment ratings from Facebook comments. Thereafter, the four estimated sentiment scores from each of the seven instruments are integrated with stock data in a step-by-step fashion to determine the overall influence on the stock market. When all four sentiment scores are paired with stock data, the forecast accuracy for five out of seven tools is at its most noteworthy, with linear SVC computed scores assisting stock data to arrive at its most elevated accuracy of 98.32%.Entities:
Keywords: Deep learning; Sentiment analysis; Stock market prediction; Web scraping
Year: 2022 PMID: 35911484 PMCID: PMC9325657 DOI: 10.1007/s13278-022-00919-3
Source DB: PubMed Journal: Soc Netw Anal Min
Summary of related work
| Web scraping source | Related work | Source data particulars | Sentiment analysis (SA) tools and other working algorithms | Result |
|---|---|---|---|---|
| (Hajhmida and Oueslati | Web crawling apptracs.com. Comments, No. of Shares, Likes and other relevant information stored in MongoDB | Lexicon of user created words, nearest neighbours, RBF SVM, decision tree, random forest, Neural Nets, AdaBoost, Naïve Bayes, logistic regression | Polarity of highly positive, positive, highly negative and negative. Best prediction accuracy 84.73% from random forest | |
| (Akter and Aziz | Bangladesh-based Facebook group “FOODBANK” | Dictionary-based; Lexicon-based, Naïve Bayes | Dictionary-based accuracy 73%, Lexicon-based analysis outperforms Naïve Bayes | |
| (Marengo et al. | 603 user-generated languages | Data mining through LIWC closed vocabulary method utilizing Random Forests to predict QoL dimensions | Highest accuracy achieved in Psychological and general QoL dimensions | |
| (Rase | 1452 comments of Oromo Democratic Party’s official site | Document-level sentiment by multinomial Naïve Bayes, LSTM and CNN | Although MNB outperforms both LSTM (87.6% accuracy) and CNN (89% accuracy), it faces problems of indirect comments | |
| (Chou et al. | Stock Twits and Twitter for SA, Yahoo Finance for Stock data | LSTM and GloVec | LSTM + Sentiment Score + attention Model outperforms | |
| (S.-U. Hassan et al. | Tweets for SA and Altmetric.com for publications | SentiStrength, linear regression with one additional indicator: ‘number of unique Twitter users’ | Positive correlation between Tweets and literature’s early impact | |
| (Lu and Zheng | Tweets | LDA model, Kullback–Leibler (KL) divergence | Eagerness for niche cruise rather than mass cruise | |
| (Mehta et al. | Apache Flume used for Tweets of Bitcoin, News articles | XGBoost, LSTM | Positive sentiment towards prediction of cryptocurrency | |
| (Singh et al. | tweepy APIs from 20 Jan 2020 to 25 April 2020 for world tweet data and Indian’s tweets | BERT tool for classification, VADER for intensity, and TextBlob for polarity and subjectivity | Indians communicated positively towards Govt. activity with 94% accuracy | |
| [18] | 100,000 Tweets using API duration 19/10/2020–29/10/2020 | TextBlob Rule Based, LDA, SARIMAX | 76% accuracy, RMSE 0.196 | |
| (Chauhan et al. | Tweets | Machine learning, Lexicon-based and Deep learning | Application of machine learning methods dominated in election result prediction | |
| News paper | (Ghasiya and Okamura | Headlines of three Japanese newspapers | Topic modelling approach NMF, NLP and ML Algorithms | 50.37% negative and 49.63% positive result |
| (Gite et al. | “The Pulse” for SA and Yahoo finance for Stock Data | LSTM-CNN for SA and LSTM for stock price prediction | 93.15% accuracy | |
| (Mehta et al. | BSE Sensex-Infosys for Stock data and Money control, IIFL, Economic Times, Business Standard, Reuters, and Live Mintdata for SA | Support Vector Machine, MNB classifier, linear regression, Naïve Bayes and Long Short-Term Memory | LSTM outperforms with 92.45% accuracy | |
| Online financial news and other news articles | (Shi et al. | Snowball financial online community in China for collecting financial comments of investors for SA, The Shanghai exchanges, the top 50 stocks in Shenzhen exchanges and the top 30 stocks in American stocks | CNN, GRU for SA. SVM, LR for stock prediction | 9% improvement over LR for SA, Stock prediction is improved 1.25% over LR |
| (Ly and Nguyen | Five different datasets: first 3, 5, 10, 20, and 30-days price | EDGAR package and the Loughran–McDonald Sentiment Word Lists for SA, Baseline model, Random Forests, Decision Tree, Naïve Bayes and Logistic Regression for price movement forecasting | Logistic Regression performs best, then comes Naïve Bayes and Baseline model | |
| (Wu et al. | Stock posts and financial news for SA and China Shanghai A-share market data | CNN for SA and LSTM for Stock Closing Price prediction | Accuracy is very close to the actual price | |
| (Elena | OMXS30 stock data and financial news for SA | Vader, Loughran–McDonald for SA and a tree-based ensemble model: XGBoost for Stock Price Movement Prediction | A hyper parameter is extracted using cross-validation and grid search for better performance | |
| (Arif et al. | Web data and Kaggle dataset | Naïve Bayes, RCNN and Random Forest | Kaggle dataset outperform by 96.13% over 86.5% from web-scrapped data. Naïve Bayes performs least efficiently and RCNN most efficiently | |
| (Turner et al. 2021) | Stock data and web data with domain-specific lexicon | Henry’s lexicon, Loughran’s lexicon and SentiWordNet | Domain-specific lexicon is most accurate | |
| (Z. Huang and Tanaka | Historical prices from US stock markets and asset related news from media | Deep Q-Network | EAM-enabled SAM performed best | |
| (X. Huang et al. | Weibo social media posts for SA and Crypto currency price | LSTM | LSTM performs better than the AR approach by 18.5% in precision and 15.4% in recall |
Fig. 1Sentiment analysis tools’ (used in this work) positions in sentiment analysis classification
Fig. 2LSTM architecture
Fig. 3System model of the current research work
Details of web scraping implementation
| Data | Sources of web scraping 01/07/2020 to 29/12/2020 | Web scraping methods | Scraped raw data size |
|---|---|---|---|
| DS-1 | Stock related articles headlines from Economic Times | Selenium and pandas | 1266 |
| DS-2 | Tweets from Twitter with keyword “nifty50” | Twint rather than Tweep which can only extract tweets upto the last 7 | 79,908 |
| DS-3 | Financial news from Economic Times | Eclipse, Jdk 8 or above, Maven, Selenium framework (findElements(By.tagname()), Chrome Webdriver | 295 |
| DS-4 | Facebook comments with keywords like nifty finance, nifty stocks, nifty prediction, nifty analysis, nifty advice, nifty trend, nifty 50 | Facebook being a dynamically loaded website, Python’s ever popular library called “beautifulsoup” can’t be used for web crawling. Instead, a web automation tool called “selenium” has been used for this purpose | 341 |
Details of data representation after web scraping and sentiment analysis implementation
| Data | After web scraping | After sentiment analysis, average sentiment scores per day |
|---|---|---|
| Data-1 | DS-1 | ADS-1 |
| Data-2 | DS-2 | ADS-2 |
| Data-3 | DS-3 | ADS-3 |
| Data-4 | DS-4 | ADS-4 |
Details of data representation after toolwise sentiment analysis implementation
| VADER | Logistic regression | Loughran–McDonald | Henry | TextBlob | Linear SVC | Stanford |
|---|---|---|---|---|---|---|
| ADS-1_V | ADS-1_LR | ADS-1_LM | ADS-1_H | ADS-1_TB | ADS-1_SVC | ADS-1_STF |
| ADS-2_V | ADS-2_LR | ADS-2_LM | ADS-2_H | ADS-2_TB | ADS-2_SVC | ADS-2_STF |
| ADS-3_V | ADS-3_LR | ADS-3_LM | ADS-3_H | ADS-3_TB | ADS-3_SVC | ADS-3_STF |
| ADS-4_V | ADS-4_LR | ADS-4_LM | ADS-4_H | ADS-4_TB | ADS-4_SVC | ADS-4_STF |
Details of the experimental set-up
| Machine set-up | Programming platform and corresponding tools | Sentiment analysis tools | Data sources: Time range: 1 July 2020 to 31 December 2020 |
|---|---|---|---|
| Windows 10 | Anaconda3 | Logistic Regression, Linear Support Vector Classifier, | Nifty50 Stock Data from Yahoo Finance |
| Intel Core i5 | Jupytar Notebook 5.7.8 | Vader, Stanford’s Core-NLP, Textblob | Web scraped from Facebook, Twitter, |
| 8 GB RAM | Keras 2.2.4, JAVA 8 | Henry, Loughran-McDonald | “Economic Times” Stock Headlines and Financial News Article from “Economic Times” |
Web-scraped data from four online sources: DS-1, DS-2, DS-3 and DS-4
| Web scraping from four online sources | ||
|---|---|---|
| Data | Date | Web-scraped data |
| DS-1: Stock Market Related Articles’ Headlines: | 30–12-2020 | Trade Setup: Nifty prone to profit booking at current level, consolidation overdue |
| DS-2: Tweets from Twitter | 31–12-202,023:57:00 | The Nifty50 has finally hit the â‚114000 for the first time ever on the last day of 2020. I think the bull run is likely to continue in the year 2021, Nifty50 may hit 15,000 and Sensex to cross 50,000 by December #2021 #cryptocurrency #stock #indianstockmarket #intraday #india |
| DS-3: Financial News | Oct 28, 2020, 04:12 PM IST | Financial conditions in India have recovered significantly after hitting the abyss in April: Crisil |
| DS-4: Facebook Comments | 15–12-2020 | sensex inch fresh high hdfc twin sparkle bajaj finance top gainer sensex hdfc |
Fig. 4Confusion matrix
Details of Data Combination for input to the LSTM Model
| Tool[i] generated sentiment scores combination with stock data for LSTM model | ||||
|---|---|---|---|---|
| Stock Data | Stock Data + ADS-1_Tool[j] | Stock Data + ADS-1_Tool[j] + ADS-2_Tool[j] | Stock Data + ADS-1_Tool[j] + ADS-2_Tool[j] + ADS-3_Tool[j] | Stock Data + ADS-1_Tool[j] ADS-2_Tool[j] ADS-3_Tool[j] + ADS-4_Tool[j] |
Details of the LSTM model specification
| Model | Layers | Optimizer | Loss function | Classification metrics | Epochs |
|---|---|---|---|---|---|
| LSTM | 3 | Adam | MSE, MAE, R-squared | Accuracy, Recall, F1 score | 100 |
Average sentiment scores per day from different tools
| Date | ADS-1_V | ADS-2_V | ADS-3_V | ADS-4_V |
|---|---|---|---|---|
| 01–07-2020 | 0.75 | 0.737 | 1 | 1 |
| 02–07-2020 | 1 | 0.698 | -0.25 | 0.5 |
| 03–07-2020 | 0.143 | 0.678 | 0.333 | 1 |
| 06–07-2020 | 0 | 0.713 | 1 | 1 |
| 07–07-2020 | 0.333 | 0.74 | 0.5 | 0 |
| Date | ADS-1_SVC | ADS-2_SVC | ADS-3_SVC | ADS-4_SVC |
| 01–07-2020 | 0.167 | 0.194 | 1 | 0.114 |
| 02–07-2020 | 0.111 | 0.132 | 0.333 | 0.129 |
| 03–07-2020 | 0.143 | 0.233 | 0.5 | 0.199 |
| 06–07-2020 | 0.5 | 0.217 | 1 | 0.177 |
| 07–07-2020 | -0.167 | 0.204 | 0.286 | 0.183 |
| Date | ADS-1_H | ADS-2_H | ADS-3_H | ADS-4_H |
| 01–07-2020 | 0.583 | 0.192 | 0.139 | 0.66 |
| 02–07-2020 | 0.8 | 0.049 | 0.068 | 1 |
| 03–07-2020 | 0.078 | 0.069 | 0.078 | 0.74 |
| 06–07-2020 | 0.139 | 0.139 | 1 | 0 |
| 07–07-2020 | 0.154 | 0.106 | 0.333 | 0.154 |
Sample Nifty50 stock data
| Date | Open | High | Low | Close |
|---|---|---|---|---|
| 07–01-2020 | 10,323.79 | 10,447.04 | 10,299.59 | 10,430.04 |
| 07–02-2020 | 10,493.04 | 10,598.20 | 10,485.54 | 10,551.70 |
| 07–03-2020 | 10,614.95 | 10,631.29 | 10,562.65 | 10,607.34 |
| 07–06-2020 | 10,723.84 | 10,811.40 | 10,695.09 | 10,763.65 |
| 07–07-2020 | 10,802.84 | 10,813.79 | 10,689.70 | 10,799.65 |
Fig. 5Stock price history
First category experimental results in terms of cost functions
| Tool | Metric | Stock data with ADS-1 | Stock data with ADS-2 | Stock data with ADS-3 | Stock data with ADS-4 |
|---|---|---|---|---|---|
| Henry | R-squared | 0.2098 | 0.1423 | 0.3523 | 0.2566 |
| MSE | 0.0365 | 0.0441 | 0.0141 | 0.0263 | |
| MAE | 0.0929 | 0.0986 | 0.0855 | 0.0881 | |
| Logistic regression | R-squared | 0.4015 | 0.1715 | 0.3237 | 0.2012 |
| MSE | 0.0111 | 0.0422 | 0.0164 | 0.0316 | |
| MAE | 0.0795 | 0.0944 | 0.0859 | 0.0917 | |
| Loughran–McDonald | R-squared | 0.2112 | 0.1689 | 0.1988 | 0.2714 |
| MSE | 0.0301 | 0.0353 | 0.0321 | 0.0238 | |
| MAE | 0.0807 | 0.0934 | 0.0925 | 0.0853 | |
| VADER | R-squared | 0.1864 | 0.2068 | 0.3037 | 0.3748 |
| MSE | 0.0362 | 0.0326 | 0.0222 | 0.0133 | |
| MAE | 0.0986 | 0.09 | 0.0855 | 0.0823 | |
| TextBlob | R-squared | 0.2282 | 0.2309 | 0.2008 | 0.2266 |
| MSE | 0.0286 | 0.0282 | 0.0329 | 0.0299 | |
| MAE | 0.0912 | 0.0893 | 0.092 | 0.0913 | |
| Linear SVC | R-squared | 0.3823 | 0.3311 | 0.3266 | 0.4176 |
| MSE | 0.0115 | 0.0126 | 0.0133 | 0.0108 | |
| MAE | 0.0809 | 0.0829 | 0.0848 | 0.0781 | |
| Stanford | R-squared | 0.1731 | 0.3201 | 0.1857 | 0.1666 |
| MSE | 0.0398 | 0.0152 | 0.0382 | 0.0425 | |
| MAE | 0.0964 | 0.0863 | 0.0947 | 0.0951 |
First category experimental results in terms of accuracy, recall, and F1 score
| Stock data with ADS-1 | Stock data with ADS-2 | Stock data with ADS-3 | Stock data with ADS-4 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Recall | F1 Score | Accuracy | Recall | F1 Score | Accuracy | Recall | F1 Score | Accuracy | Recall | F1 score | |
| VADER | 95.68% | 82.14% | 90.16% | 94.82% | 78.57% | 88% | 96.55% | 89.28% | 92.59% | 97.23% | 90.78% | 94.12% |
| Logistic Regression | 97.44% | 91.28% | 94.78% | 93.48% | 77.72% | 87.65% | 96.97% | 89.75% | 90.19% | 95.96% | 83.33% | 91.26% |
| Loughran–McDonald | 95.08% | 80.65% | 89.17% | 95.59% | 81.45% | 89.76% | 95.99% | 82.81% | 90.77% | 96.35% | 85.71% | 92.3% |
| Henry | 94.57% | 78.20% | 87.68% | 93.39% | 77.34% | 87.29% | 97.08% | 90.48% | 93.87% | 96.23% | 85.62% | 92.21% |
| TextBlob | 96.09% | 85.35% | 91.72% | 96.13% | 85.52% | 92.11% | 95.92% | 83.27% | 91.41% | 96.02% | 85.11% | 91.34% |
| Linear SVC | 97.29% | 90.98% | 94.43% | 96.80% | 90.12% | 93.02% | 96.73% | 89.77% | 92.98% | 98.11% | 91.62% | 95.18% |
| Stanford | 94.13% | 77.68% | 86.89% | 96.72% | 89.62% | 92.88% | 94.3% | 77.51% | 87.12% | 93.75% | 78.52% | 88.61% |
Second category experimental results in terms of cost functions
| Tool | Metric | Stock data with ADS-1 and ADS-2 | Stock data with ADS-1, ADS-2 and ADS-3 | Stock data with ADS-1, ADS-2, ADS-3 and ADS-4 |
|---|---|---|---|---|
| Henry | R-squared | 0.2361 | 0.1285 | 0.2749 |
| MSE | 0.0215 | 0.0391 | 0.0252 | |
| MAE | 0.0894 | 0.0966 | 0.0877 | |
| Logistic regression | R-squared | 0.2159 | 0.1834 | 0.3798 |
| MSE | 0.0242 | 0.031 | 0.0123 | |
| MAE | 0.0907 | 0.0925 | 0.0809 | |
| Loughran–McDonald | R-squared | 0.2107 | 0.2091 | 0.2118 |
| MSE | 0.0253 | 0.0272 | 0.0243 | |
| MAE | 0.0901 | 0.0911 | 0.0899 | |
| VADER | R-squared | 0.1767 | 0.2363 | 0.3375 |
| MSE | 0.0379 | 0.0204 | 0.0193 | |
| MAE | 0.0968 | 0.0897 | 0.0813 | |
| TextBlob | R-squared | 0.2982 | 0.2818 | 0.2472 |
| MSE | 0.0215 | 0.0243 | 0.0259 | |
| MAE | 0.0847 | 0.09 | 0.0825 | |
| Linear SVC | R-squared | 0.2233 | 0.2603 | 0.435 |
| MSE | 0.0329 | 0.0271 | 0.0103 | |
| MAE | 0.0902 | 0.089 | 0.0789 | |
| Stanford | R-squared | 0.1881 | 0.2461 | 0.2739 |
| MSE | 0.0322 | 0.0272 | 0.0259 | |
| MAE | 0.0943 | 0.0869 | 0.0852 |
Second category experimental results in terms of accuracy, recall, and F1 score
| Accuracy, recall and F1-score for combination data | Stock data with ADS-1 and ADS-2 | Stock data with ADS-1, ADS-2 and ADS-3 | Stock data with ADS-1, ADS-2, ADS-3 and ADS-4 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Tools for calculating Sentiment scores | |||||||||
| Accuracy | Recall | F1 Score | Accuracy | Recall | F1 Score | Accuracy | Recall | F1 Score | |
| VADER | 95.87% | 83.36% | 91.78% | 94.87% | 79.5% | 87.84% | 96.85% | 88.76% | 91.69% |
| Logistic Regression | 94.67% | 81.57% | 88.51% | 94.25% | 78.87% | 87.45% | 97.67% | 91.13% | 93.72% |
| Loughran–McDonald | 94.75% | 81.81% | 88.97% | 94.63% | 79.27% | 87.48% | 94.78% | 81.98% | 89.02% |
| Henry | 94.8% | 82.11% | 89.62% | 92.99% | 76.88% | 84.69% | 96.36% | 87.53% | 90.94% |
| TextBlob | 96.76% | 88.42% | 90.83% | 96.28% | 87.05% | 90.21% | 96.48% | 88.15% | 91% |
| Linear SVC | 96.15% | 86.85% | 89.82% | 96.35% | 87.41% | 90.73% | 98.32% | 93.12% | 95.24% |
| Stanford | 93.62% | 78.74% | 86.82% | 96.46% | 87.8% | 91.11% | 96.57% | 87.95% | 91.4% |
Fig. 6Comparative performance from each tool with combined datasets
Sample Nifty50 stock data with all four linear SVC sentiment scores
| Date | Open | High | Low | Close | ADS-1_SVC | ADS-2_SVC | ADS-3_SVC | ADS-4_SVC | Class |
|---|---|---|---|---|---|---|---|---|---|
| 07–01-2020 | 10,323.79 | 10,447.04 | 10,299.59 | 10,430.04 | 0.75 | 0.192 | 1 | 1 | 1 |
| 07–02-2020 | 10,493.04 | 10,598.20 | 10,485.54 | 10,551.70 | 1 | 0.049 | 0.068 | 0.068 | 1 |
| 07–03-2020 | 10,614.95 | 10,631.29 | 10,562.65 | 10,607.34 | 0 | 0.069 | 0.078 | 0.078 | 0 |
| 07–06-2020 | 10,723.84 | 10,811.40 | 10,695.09 | 10,763.65 | 0.667 | 0.139 | 0.139 | 0.139 | 1 |
| 07–07-2020 | 10,802.84 | 10,813.79 | 10,689.70 | 10,799.65 | 0.25 | 0.106 | -0.667 | 0.154 | 0 |
Stock market prediction results with linear SVC’s four sentiment scores
| Results | MAE | MSE | Accuracy |
|---|---|---|---|
| Dataset | |||
| Stock Data Only | 0.097 | 0.0545 | 95.22% |
| Stock Data + ADS-1_SVC | 0.0809 | 0.0115 | 97.29% |
| Stock Data + ADS-1_SVC + ADS-2_SVC | 0.0902 | 0.0329 | 96.15% |
| Stock Data + ADS-1_SVC + ADS-2_SVC + ADS-3_SVC | 0.089 | 0.0271 | 96.35% |
| Stock Data + ADS-1_SVC + ADS-2_SVC + ADS-3_SVC + ADS-4_SVC | 0.0789 | 0.0103 | 98.32% |
Fig. 7Stock market movement prediction without any sentiment score
Fig. 8Stock market movement prediction with one sentiment score (news headlines) from linear SVC
Fig. 9Stock market movement prediction with two sentiment scores (news headlines and Twitter) from linear SVC
Fig. 10Stock market movement prediction with three sentiment scores (news headlines, Twitter and news articles) from linear SVC
Fig. 11Stock Market Movement Prediction with four sentiment scores (news headlines, Twitter, news articles, and Facebook comments) from Linear SVC
Comparison with two existing works
| Works done | Online data sources for sentiment analysis | Sentiment analysis tools | Stock price prediction method | Stock market data | Accuracy | |
|---|---|---|---|---|---|---|
| (Dutta, Pooja, Jain, Panda, and Nagwani, 2021) | News articles from some newspapers like “Economic Times” | VADER | LSTM | S & P 500 from Yahoo Finance | 77.45% | |
| (Wang et al. | Stock related news headlines from online media sources like “The New York Times” | VADER | Machine Learning Algorithms | Dow Jones Industrial Average (DJIA) from Yahoo Finance | 72.98% | |
| Proposed work | Stock related articles headlines from "Economic Times," Tweets from Twitter, Financial news from "Economic Times" and Facebook comments | VADER, Logistic Regression, Loughran–McDonald, Henry, TextBlob, Linear SVC and Stanford | LSTM | Nifty50 (NSE) from Yahoo Finance | Linear SVC | 98.32% |
| Logistic Regression | 97.67% | |||||
| VADER | 96.85% | |||||
| Loughran–McDonald | 94.78% | |||||
| Henry | 96.36% | |||||
| TextBlob | 96.48% | |||||
| Stanford | 96.57% |
Comparison with two existing works
| Experimental results for combination data | Stock data with ADS-1, ADS-2, ADS-3 and ADS-4 | |||||
|---|---|---|---|---|---|---|
| Experiment categories | Classification result | Regression result | ||||
| Tools for calculating sentiment scores | Accuracy | Recall | F1 score | R-squared | MSE | MAE |
| VADER | 96.85% | 88.76% | 91.69% | 0.3375 | 0.0193 | 0.0813 |
| Logistic regression | 97.67% | 91.13% | 93.72% | 0.3798 | 0.0123 | 0.0809 |
| Loughran–McDonald | 94.78% | 81.98% | 89.02% | 0.2118 | 0.0243 | 0.0899 |
| Henry | 96.36% | 87.53% | 90.94% | 0.2749 | 0.0252 | 0.0877 |
| TextBlob | 96.48% | 88.15% | 91% | 0.2472 | 0.0259 | 0.0825 |
| Linear SVC | 98.32% | 93.12% | 95.24% | 0.435 | 0.0103 | 0.0789 |
| Stanford | 96.57% | 87.95% | 91.4% | 0.2739 | 0.0259 | 0.0852 |