| Literature DB >> 22829871 |
Ilaria Bordino1, Stefano Battiston, Guido Caldarelli, Matthieu Cristelli, Antti Ukkonen, Ingmar Weber.
Abstract
We live in a computerized and networked society where many of our actions leave a digital trace and affect other people's actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.Entities:
Mesh:
Year: 2012 PMID: 22829871 PMCID: PMC3400625 DOI: 10.1371/journal.pone.0040014
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Graphical illustration of the analysis presented in this paper.
The study of queries is gaining more and more attention as an important tool for the understanding of social and financial systems. Users perform web searches in order to collect news or browse e-newspaper sites. In particular local or global events such as natural disasters can generate local or global waves of searches through the web. As a result, the logs of these search-engines’ queries are an unprecedented source of anonymized information about human activities. In this paper we provide a detailed analysis on a particular application of these ideas; that is, the anticipation of market activity from user queries. This picture graphically summarizes our procedure. In particular, we investigate which is the relationship between web searches and market movements and whether web searches anticipate market activity. While we can expect that large fluctuations in markets, produce spreading of news or rumors or government’s actions and therefore induce web searches (solid green arrow in panel ), we would like to check if web searches affect or even anticipate financial activity (broken violet arrow in panel ). In detail we investigate if today’s query volumes about financial stocks somehow anticipate financial indicators of tomorrow such as trading volumes, daily returns, volatility, etc, (panels and ) and we find a significant anticipation for trading volumes.
The 100 traded companies included in the NASDAQ-100 index with their relative ticker.
| Activision Blizzard (ATVI) | Adobe Systems Incorporated (ADBE) | Akamai Technologies, Inc (AKAM) |
| Altera Corporation (ALTR) | Amazon.com, Inc. (AMZN) | Amgen Inc. (AMGN) |
| Apollo Group, Inc. (APOL) | Apple Inc. (AAPL) | Applied Materials, Inc. (AMAT) |
| Autodesk, Inc. (ADSK) | Automatic Data Processing, Inc. (ADP) | Baidu.com, Inc. (BIDU) |
| Bed Bath & Beyond Inc. (BBBY) | Biogen Idec, Inc (BIIB) | BMC Software, Inc. (BMC) |
| Broadcom Corporation (BRCM) | C. H. Robinson Worldwide, Inc. (CHRW) | CA, Inc. (CA) |
| Celgene Corporation (CELG) | Cephalon, Inc. (CEPH) | Cerner Corporation (CERN) |
| Check Point Software Technologies Ltd. (CHKP) | Cisco Systems, Inc. (CSCO) | Citrix Systems, Inc. (CTXS) |
| Cognizant Tech. Solutions Corp. (CTSH) | Comcast Corporation (CMCSA) | Costco Wholesale Corporation (COST) |
| Ctrip.com International, Ltd. (CTRP) | Dell Inc. (DELL) | Dentsplay International Inc. (XRAY) |
| DirecTV (DTV) | Dollar Tree, Inc. (DLTR) | eBay Inc. (EBAY) |
| Electronic Arts Inc. (ERTS) | Expedia, Inc. (EXPE) | Expeditors Int. of Washington, Inc. (EXPD) |
| Express Scripts, Inc. (ESRX) | F5 Networks, Inc. (FFIV) | Fastenal Company (FAST) |
| First Solar, Inc. (FSLR) | Fiserv, Inc. (FISV) | Flextronics International Ltd. (FLEX) |
| FLIR Systems, Inc. (FLIR) | Garmin Ltd. (GRMN) | Genzyme Corporation (GENZ) |
| Gilead Sciences, Inc. (GILD) | Google Inc. (GOOG) | Henry Schein, Inc. (HSIC) |
| Illumina, Inc. (ILMN) | Infosys Technologies (INFY) | Intel Corporation (INTC) |
| Intuit, Inc. (INTU) | Intuitive Surgical Inc. (ISRG) | Joy Global Inc. (JOYG) |
| KLA Tencor Corporation (KLAC) | Lam Research Corporation (LRCX) | Liberty Media Corp., Int. Series A (LINTA) |
| Life Technologies Corporation (LIFE) | Linear Technology Corporation (LLTC) | Marvell Technology Group, Ltd. (MRVL) |
| Mattel, Inc. (MAT) | Maxim Integrated Products (MXIM) | Microchip Technology Incorporated (MCHP) |
| Micron Technology, Inc. (MU) | Microsoft Corporation (MSFT) | Millicom International Cellular S.A. (MICC) |
| Mylan, Inc. (MYL) | NetApp, Inc. (NTAP) | Netflix, Inc. (NFLX) |
| News Corporation, Ltd. (NWSA) | NII Holdings, Inc. (NIHD) | NVIDIA Corporation (NVDA) |
| OÕReilly Automotive, Inc. (ORLY) | Oracle Corporation (ORCL) | PACCAR Inc. (PCAR) |
| Paychex, Inc. (PAYX) | Priceline.com, Incorporated (PCLN) | Qiagen N.V. (QGEN) |
| QUALCOMM Incorporated (QCOM) | Research in Motion Limited (RIMM) | Ross Stores Inc. (ROST) |
| SanDisk Corporation (SNDK) | Seagate Technology Holdings (STX) | Sears Holdings Corporation (SHLD) |
| Sigma-Aldrich Corporation (SIAL) | Staples Inc. (SPLS) | Starbucks Corporation (SBUX) |
| Stericycle, Inc (SRCL) | Symantec Corporation (SYMC) | Teva Pharmaceutical Industries Ltd. (TEVA) |
| Urban Outfitters, Inc. (URBN) | VeriSign, Inc. (VRSN) | Vertex Pharmaceuticals (VRTX) |
| Virgin Media, Inc. (VMED) | Vodafone Group, plc. (VOD) | Warner Chilcott, Ltd. (WCRX) |
| Whole Foods Market, Inc. (WFMI) | Wynn Resorts Ltd. (WYNN) | Xilinx, Inc. (XLNX) |
| Yahoo! Inc. (YHOO) |
Figure 2Query log volumes and trading volumes: cross correlation analysis (ticker: “NVDA”).
(up) Time evolution of normalized query-logs volumes for the ticker “NVDA” compared with the trading-volume of the “NVIDIA Corporation”. The data for both query-logs (blue) and trading volume (red) are aggregated on a daily basis. (bottom) The plot of the sample cross correlation function as defined in Eq. (1) absolute values of the time lag (positive values of correspond to solid lines while negative values of the time lag correspond to the broken lines). The correlation coefficients at positive time lags are always larger than the corresponding at negative ones, this suggests that today’s query volumes anticipate and affect the trading activity of the following days (typically one or two days at most).
Figure 3Query log volumes and trading volumes: cross correlation analysis (ticker: “RIMM”).
(up) Time evolution of normalized query-logs volumes for the ticker “RIMM” compared with the trading-volume of the “Research In Motion Limited”. The data for both query-logs (blue) and trading volume (red) are aggregated on a daily basis. (bottom) The plot of the sample cross correlation function as defined in Eq. (1) vs absolute values of the time lag (positive values of correspond to solid lines while negative values of the time lag correspond to the broken lines). As in the case of the ticker “NVDA” corresponding to the company “NVIDIA Corporation” in Fig. 2, the correlation coefficients at positive time lags are always larger than the corresponding at negative ones, this suggests that today’s query volumes anticipate and affect the trading activity of the following days (typically one or two days at most).
Figure 4Query volumes and trading volumes.
We plot the query-search volumes and trading volumes time series for four stocks (AAPL, AMZN, NFLX and ADBE) to show that the patterns observed in Figs. 2 and 3 are common to most of stocks of the set considered (NASDAQ-100).
Average cross-correlation functions for the clean NASDAQ-100 stocks (query: Ticker, volumes: searches).
|
| −5 | −4 | −3 | −2 | −1 | 0 | 1 | 2 | 3 | 4 | 5 |
| CCF | 0.0176 | 0.0604 | 0.0657 | 0.0993 | 0.1816 | 0.3641 | 0.2700 | 0.1145 | 0.0834 | 0.0540 | 0.0312 |
By clean stocks we mean that we remove those stocks which give rise to spurious queries such as the one containing a common words like LIFE or for instance the stock EBAY. In Supporting Information S1 we report the cross correlation functions of the 87 stocks on which the average is performed.
Average cross-correlation time series for NASDAQ-100 clean stocks (query: Ticker, volumes: users).
|
| −5 | −4 | −3 | −2 | −1 | 0 | 1 | 2 | 3 | 4 | 5 |
| CCF | 0.0078 | 0.0344 | 0.0501 | 0.0736 | 0.1482 | 0.3194 | 0.2349 | 0.0876 | 0.0623 | 0.0345 | 0.0151 |
The results from the queries of Yahoo! users or from all searches (Table 2) are almost identical.
Average cross-correlation time series for NASDAQ-100 stocks (query: Ticker, volumes: searches).
|
| −5 | −4 | −3 | −2 | −1 | 0 | 1 | 2 | 3 | 4 | 5 |
| CCF | 0.0067 | 0.0487 | 0.0507 | 0.0806 | 0.1510 | 0.3150 | 0.2367 | 0.0940 | 0.0675 | 0.0433 | 0.0197 |
Average cross-correlation time series for NASDAQ-100 stocks (query: Company name, volumes: searches).
|
| −5 | −4 | −3 | −2 | −1 | 0 | 1 | 2 | 3 | 4 | 5 |
| CCF | 0.0159 | 0.0629 | 0.0508 | 0.0455 | 0.0639 | 0.1196 | 0.1083 | 0.0561 | 0.0509 | 0.0299 | 0.0169 |
Correlations are lower than the case in which we consider the queries deriving from the tickers (Table 4).
Values of cross-correlation functions for some selected stocks.
| Ticker |
|
|
|
|
|
|
|
|
|
|
|
| ADBE | 0.08 | 0.12 | 0.14 | 0.19 | 0.47 | 0.83 | 0.51 | 0.19 | 0.09 | 0.10 | 0.11 |
| CEPH | 0.16 | 0.26 | 0.22 | 0.14 | 0.32 | 0.80 | 0.44 | 0.24 | 0.12 | 0.13 | 0.15 |
| APOL | 0.02 | 0.06 | 0.10 | 0.21 | 0.43 | 0.79 | 0.55 | 0.22 | 0.12 | 0.07 | 0.03 |
| NVDA | 0.23 | 0.36 | 0.38 | 0.46 | 0.56 | 0.79 | 0.68 | 0.47 | 0.42 | 0.38 | 0.29 |
| CSCO | 0.04 | 0.07 | 0.13 | 0.36 | 0.53 | 0.74 | 0.63 | 0.34 | 0.26 | 0.17 | 0.12 |
| AKAM | −0.04 | −0.06 | 0.03 | 0.07 | 0.22 | 0.72 | 0.49 | 0.20 | 0.11 | 0.02 | -0.01 |
| NFLX | 0.10 | 0.16 | 0.16 | 0.24 | 0.47 | 0.68 | 0.54 | 0.25 | 0.19 | 0.16 | 0.13 |
| ISRG | 0.07 | 0.13 | 0.18 | 0.21 | 0.38 | 0.67 | 0.64 | 0.29 | 0.20 | 0.11 | 0.05 |
| RIMM | 0.03 | 0.12 | 0.11 | 0.14 | 0.31 | 0.66 | 0.58 | 0.24 | 0.20 | 0.11 | 0.05 |
| FFIV | 0.06 | 0.06 | 0.13 | 0.21 | 0.35 | 0.65 | 0.56 | 0.33 | 0.21 | 0.14 | 0.13 |
The values of the cross-correlation function for is always higher than the value of . From this evidence it appears that query volumes anticipate trading volumes by one or two days. See Supporting Information S1 for the complete results for the 87 clean stocks.
Cross-correlation coefficient between query and trading volumes after removing largest events.
| Ticker |
|
|
|
| ADBE | 0.83 | 0.51 | 0.32 |
| CEPH | 0.80 | 0.32 | 0.24 |
| APOL | 0.79 | 0.55 | 0.46 |
| NVDA | 0.79 | 0.70 | 0.64 |
| CSCO | 0.74 | 0.56 | 0.46 |
| AKAM | 0.72 | 0.51 | 0.39 |
| NFLX | 0.68 | 0.62 | 0.62 |
| ISRG | 0.67 | 0.57 | 0.55 |
| RIMM | 0.66 | 0.59 | 0.52 |
| FFIV | 0.65 | 0.55 | 0.50 |
We compute the cross-correlation coefficient between query and trading volumes after removing the days characterized by the highest trading volumes, respectively the top five and top ten events are removed. We note that a significant correlation is still observed for most of the stocks considered. This important test supports the robustness of our findings. See Supporting Information S1for the complete results for the 87 clean stocks.
Figure 5Comparison of the cross-correlation function between query volumes and trading volumes and query volumes and volatility.
Trading volume and volatility are correlated and given the fact that volatility is also autocorrelated, the correlation between present query volume and future trading volume could be simply originated by this autocorrelated term. However, we show that the cross-correlation between query and volatility (broken line) is significantly smaller than the one between query and trading volume (solid line). Moreover the branch in the volatility case is equal or even smaller than the value observed in the one. If the origin of the effect were due to the autocorrelation component of the volatility, we would expect a similar behavior for both cross-correlation function. This facts support that the non-autocorrelated origin of the correlation between between present query volume and future trading volume. As a proxy for the volatility we use the absolute value of daily price returns.
Figure 6Typical users’ behavior.
Average (left) monthly and (right) yearly distribution of the number of distinct tickers searched by any Yahoo! user.
Figure 7Behavior of the users who search for AAPL.
Distribution of the number of days that users searched for AAPL within one month (left) and over the whole year (right).
Figure 8Behavior of the users who search for AMZN.
Distribution of the number of days that users searched for AMZN within one month (left) and over the whole year (right).
Figure 9Behavior of the users who search for NFLX.
Distribution of the number of days that users searched for NFLX within one month (left) and over the whole year (right).
Figure 10Evolution of the percentage of one-time searchers.
The fraction of one-time searchers appear to be very stable in time and we do not observe a correlation of these kind of users with anomalous trading volume or price movements.
Figure 11Query-search for AAPL stock in the various days of the week.
Query volumes of NASDAQ-100 tickers are negligible during non-working days, then we consider only the contribution to query volumes deriving from working days.
Average cross-correlation functions between search-engine volumes and signed price returns for the clean NASDAQ-100 stocks (query: Ticker, ).
| Volume | Price returns | Avg correlation |
| searches |
| 0.2650 |
| searches |
| −0.2360 |
| searches |
| 0.2728 |
| users |
| 0.2722 |
| users |
| −0.1975 |
| users |
| 0.2446 |
Granger causality test.
| Dataset | lag(days) | Direction |
|
| Avg reduction in RSS |
| Q (100 tickers) | 1 | Q |
|
|
|
| Q (100 tickers) | 1 | T |
|
|
|
| U (100 tickers) | 1 | U |
|
|
|
| U (100 tickers) | 1 | T |
|
|
|
| Q (100 tickers) | 2 | Q |
|
|
|
| Q (100 tickers) | 2 | T |
|
|
|
| U (100 tickers) | 2 | U |
|
|
|
| U (100 tickers) | 2 | T |
|
|
|
| Q (87 tickers) | 1 | Q |
|
|
|
| Q (87 tickers) | 1 | T |
|
|
|
| U (87 tickers) | 1 | U |
|
|
|
| U (87 tickers) | 1 | T |
|
|
|
| Q (87 tickers) | 2 | Q |
|
|
|
| Q (87 tickers) | 2 | T |
|
|
|
| U (87 tickers) | 2 | U |
|
|
|
| U (87 tickers) | 2 | T |
|
|
|
Adding information about yesterday’s query volume reduces the average prediction error (in an autoregressive model) for today’s trade volume by about , and for half of the companies the reduction is statistically significant at .
Age distribution of users.
| Age Range | Fraction of Users |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Average age distribution for a random sample collecting half of the data.
Age distribution for NASDAQ-100 sample.
| Age Range | Fraction of Users |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
We observe some minor differences between the age of common users and the one of the users corresponding to queries belonging to NASDAQ-100 sample.