| Literature DB >> 34350010 |
Metod Jazbec1, Barna Pàsztor1, Felix Faltings1, Nino Antulov-Fantulin2,3, Petter N Kolm3.
Abstract
We quantify the propagation and absorption of large-scale publicly available news articles from the World Wide Web to financial markets. To extract publicly available information, we use the news archives from the Common Crawl, a non-profit organization that crawls a large part of the web. We develop a processing pipeline to identify news articles associated with the constituent companies in the S&P 500 index, an equity market index that measures the stock performance of US companies. Using machine learning techniques, we extract sentiment scores from the Common Crawl News data and employ tools from information theory to quantify the information transfer from public news articles to the US stock market. Furthermore, we analyse and quantify the economic significance of the news-based information with a simple sentiment-based portfolio trading strategy. Our findings provide support for that information in publicly available news on the World Wide Web has a statistically and economically significant impact on events in financial markets.Entities:
Keywords: complex systems; financial markets; machine learning; sentiment analysis; transfer entropy
Year: 2021 PMID: 34350010 PMCID: PMC8316821 DOI: 10.1098/rsos.202321
Source DB: PubMed Journal: R Soc Open Sci ISSN: 2054-5703 Impact factor: 2.963
Figure 1The pipeline deployed to process and transform the Common Crawl News dataset into the dataset used by the sentiment model. Each box represents one stage of the pipeline where data transformation and filtering steps are applied. The numbers next to the arrows show how many articles are passed on from one stage to the next. The percentages in the brackets after each filtering step show the proportion of articles removed in that specific step.
Figure 2Process chart of the sentiment model. The two assumptions underlying the sentiment model are depicted in the middle. The data used in fitting the model is shown at the top. We apply the predicted sentiment scores (bottom-left corner) to analyse transfer entropy and simulate several simple trading strategies.
Figure 3Summary of the news dataset used in this article. (a) The most frequently mentioned companies as measured by the number of distinct articles. (b) The most frequent news sources as measured by the number of distinct articles associated with each source. (c) The median number of articles published per company and month. The companies are divided into top and bottom halves by the total number of articles published about them. The shaded regions represent the 25% and 75% percentiles of each half.
Figure 4(a) Companies and corresponding significant Shannon transfer entropy (and effective transfer entropy) from hourly sentiment score differences to hourly price returns. The unit of transfer entropy is bits (logarithm with base 2), corresponding to the reduction of the average optimal code length needed to encode stock returns with lagged sentiment. Transfer entropy was calculated for the period from January 2018 to February 2020 using time series of hourly returns from 9.30 to 15.30 Eastern Time and corresponding lagged average sentiment scores. The statistical significance (p-value < 0.01) of transfer entropy was estimated with 300 bootstrap samples and 100 shuffles to obtain the effective transfer entropy. (b) Box and whisker plots of estimated distributions of the p-values for selected company tickers. The box and whisker plots show Q1, median, Q3, minimum, maximum and estimated outliers.
Figure 5Cumulative returns of trading strategies and benchmarks. ‘Day 1’ represents the cumulative returns of the Day 1 sentiment strategy based on the Common Crawl News dataset from January 2018 to February 2020. SPY is the SPDR S&P 500 trust. ‘Random’ denotes the average of the random strategies along with 1 s.d. confidence bands obtained from 500 simulations. ‘Day 0’ and ‘Day −1’ are the ‘look-ahead’ sentiment strategies relying on future information.
Performance statistics of the Day 1 sentiment trading strategy and benchmarks from January 2018 to February 2020. The sentiment trading strategy is based on news articles from the Common Crawl News dataset. SPY is the SPDR S&P 500 trust. ‘Random’ denotes the baseline strategy where each day we randomly select companies to invest in. ‘Day 0’ and ‘Day −1’ are ‘look-ahead’ sentiment strategies, reported for comparison purposes. Statistics are computed using daily returns (n = 542). MDD is the maximum daily drawdown defined as the maximum observed decline from a historical peak of the price until a new peak is attained. The p-values < 0.001 are denoted with symbol ***, p-values < 0.01 with symbol **, and p-values < 0.02 with symbol *. The p-values for the Sharpe ratios were bootstrapped from 500 random backtests. We obtain α (the intercept) and R2 by regressing the daily returns of the portfolios on the daily returns of the SPY. The performance metrics of the random portfolios were bootstrapped from 500 random backtests. For daily turnover, we use , where w denotes portfolio weight at time t and r represents daily return at time t + 1 of stock i.
| Day −1 | Day 0 | Day 1 | SPY | random | |
|---|---|---|---|---|---|
| annualized average return | 48.95% | 45.42% | 21.02% | 7.25% | |
| annualized volatility | 11.65% | 12.43% | 12.85% | 15.06% | |
| annualized Sharpe ratio | 4.20*** | 3.66*** | 1.64** | 0.48 | |
| MDD | 8.82% | 8.34% | 10.31% | 21.04% | |
| annualized | 47.88%*** | 45.36%*** | 20.69%* | 0 | |
| 0.004 | 0.001 | 0.004 | 1 | ||
| daily turnover | 82.12% | 82.27% | 82.43% | 0% |