| Literature DB >> 29291122 |
Xiaoguang Huo1, Feng Fu2,3.
Abstract
Sequential portfolio selection has attracted increasing interest in the machine learning and quantitative finance communities in recent years. As a mathematical framework for reinforcement learning policies, the stochastic multi-armed bandit problem addresses the primary difficulty in sequential decision-making under uncertainty, namely the exploration versus exploitation dilemma, and therefore provides a natural connection to portfolio selection. In this paper, we incorporate risk awareness into the classic multi-armed bandit setting and introduce an algorithm to construct portfolio. Through filtering assets based on the topological structure of the financial market and combining the optimal multi-armed bandit policy with the minimization of a coherent risk measure, we achieve a balance between risk and return.Entities:
Keywords: conditional value-at-risk; graph theory; multi-armed bandit; online learning; portfolio selection; risk-awareness
Year: 2017 PMID: 29291122 PMCID: PMC5717697 DOI: 10.1098/rsos.171377
Source DB: PubMed Journal: R Soc Open Sci ISSN: 2054-5703 Impact factor: 2.963
Figure 1.Portfolio selection based on the MST. (a) The complete graph and (b) the corresponding MST constructed from the 30 selected S&P 500 stocks during the period September 2008 to October 2008. (c) The performance of the portfolio of 10 randomly selected vertices from the 14 leaves shown in b. (d) The eigenvalue spectrum of the covariance matrix of the 30 selected S&P 500 stocks in a with that of 10 stocks randomly chosen from the peripheral nodes from the MST in c.
Figure 2.Combined sequential portfolio selection algorithm can achieve a balance between risk and return. (a,d) The simulated stock paths based on the GBM. (b,e) The performance of two portfolio selection algorithms, UCB1 versus ϵ-greedy. Panels (c,f) compare the cumulative wealth obtained with our sequential portfolio selection algorithm that combines the single-asset multi-armed bandit portfolio by (2.2) and the risk-aware portfolio by (2.4) with the other four benchmarks of portfolio selection algorithms. To quantify and compare the role of volatility in the performance of portfolio selection algorithms, we present the simulation results of low volatility in (a)–(c) and high volatility in (d)–(f). Parameters: the same vector (0.04,0.035,0.08,0.02,0.03) for drift terms α is used for simulating the stock paths in (a) and (d). For each trial, the volatility terms σ are uniformly and randomly generated from the interval [0.02,0.025] in (a) and from the interval [0.03,0.035] in (d). λ=0.9.