Literature DB >> 33041390

Split Bregman iteration for multi-period mean variance portfolio optimization.

Stefania Corsaro¹, Valentina De Simone², Zelda Marino¹.

Abstract

This paper investigates the problem of defining an optimal long-term investment strategy, where the investor can exit the investment before maturity without severe loss. Our setting is a multi-period one, where the aim is to make a plan for allocating all of wealth among the n assets within a time horizon of m periods. In addition, the investor can rebalance the portfolio at the beginning of each period. We develop a model in Markowitz context, based on a fused lasso approach. According to it, both wealth and its variation across periods are penalized using the l 1 norm, so to produce sparse portfolios, with limited number of transactions. The model leads to a non-smooth constrained optimization problem, where the inequality constraints are aimed to guarantee at least a minimum level of expected wealth at each date. We solve it by using split Bregman method, that has proved to be efficient in the solution of this type of problems. Due to the additive structure of the objective function, the alternating split Bregman at each iteration yields to easier subproblems to be solved, which either admit closed form solutions or can be solved very quickly. Numerical results on data sets generated using real-world price values show the effectiveness of the proposed model.

Entities: Chemical Disease Species

Keywords: Fused lasso; Nonsmooth optimization; Portfolio selection; Split Bregman

Year: 2020 PMID： 33041390 PMCID： PMC7535806 DOI： 10.1016/j.amc.2020.125715

Source DB: PubMed Journal: Appl Math Comput ISSN： 0096-3003 Impact factor: 4.091

Introduction

We consider the regularized multi-period mean variance optimization problem and its solution using the split Bregman method. In recent years there has been a growing interest in the solution of fused lasso models [1], that are linearly constrained minimization of functions given as:where f: ℜ → ℜ is a closed convex function at least twice continuously differentiable, and L is the difference operator. The l 1 penalty in (1) promotes sparse solutions, and the term ‖L w‖1 is included to produce smooth solutions. Problems described by fused lasso models arise, e.g., in image processing [2], classification [3], finance [4]. The nonsmoothness of the l 1-type regularization terms precludes the use of standard descent methods for smooth objective functions. Problems of this kind can be solved either by smoothing the l 1 terms, e.g., [5], [6], and applying optimization solvers for differentiable problems such as gradient methods [7], [8], [9] or by using directly optimization solvers for nondifferentiable problems, such as Bregman, proximal and ADMM methods [2], [10], [11], [12]. Due to the additive structure in (1), splitting methods have became popular because they yield algorithms which consist at each iteration of subproblems that are easier to solve [13], [14]. These subproblems often either admit closed form solutions or can be solved very quickly with specialized methods. In this context, methods based on Split Bregman iteration have proved to be efficient in different fields [2], [15], [16], [17]. At each iteration of the Bregman method an l 1 regularized unconstrained optimization subproblem is solved. Auxiliary variables allow to separate the two l 1 regularization terms making the use of splitting methods an easy task. In the field of portfolio optimization, both in the static and the dynamic case, Bregman iteration has been used to solve efficiently l 1 regularized models in Markowitz’s framework. The use of l 1 penalty terms in portfolio modelling has become popular for several reasons. Transaction costs are well described by the l 1 norm of the portfolio, especially in moderate size trades. The properties of this norm allow one to obtain sparse solutions, that is, small portfolios. Small portfolios are often considered preferable to large ones. They are feasible to small investors, for the reduced holding cost. Moreover, it seems that the estimation errors for variances and covariances are reduced in this case [18], [19]. Finally, it has been observed that the application of l 1 regularization has the effect of penalizing short positions [20], [21], forbidden in several markets. The structure of Bregman iteration has been exploited both in the static and the dynamic case, to develop procedures which adaptively fix the value of the regularization parameter; this value is chosen as one that provides solutions with certain financial properties, while preserving fidelity to data [21], [22]. In [4] a model based on the fused lasso approach is presented in a multi-period setting. This seems to be very promising, since the fusion term is a penalty on portfolio turnover, which produces strategies with low transaction cost by preserving the pattern of active position over time. The fused lasso model is also considered in this paper. The novelty is the focus on strategies in which the investor can exit before the end without incurring severe loss; this is particularly significant to protect investment from financial market crisis. At this purpose, control at intermediate periods on wealth is introduced in the model. According to the multi-period approach, the so called rebalancing dates are defined to divide the investment period into subperiods; at these dates, the investment strategy is revised according to evolving information. We fix a minimum level of expected wealth at intermediate dates. From the mathematical point of view, this leads to a nonsmooth optimization problem with equality and inequality constraints. We reformulate the problem as one with equality constraints only and apply the alternating Split Bregman method for its solution. The resulting algorithm requires, at each iteration, the solution of unconstrained subproblems which are easily solved exactly. In order to test our model, we introduce some perfomance measures. In particular, we assess performance with respect to two benchmarks built on real market data; this task is usually referred to as the Enhanced Index Tracking problem [23], [24], where one aims at selecting a portfolio that outperforms a reference index. In Section 2 we describe the mathematical model. In Section 3 we present the procedure for the numerical solution of the model, based on alternating Split Bregman iteration. In Section 4 we show results of our tests.

Mathematical model

In this section we extend the fused lasso model presented in [4] with the aim of guaranteeing the investor in the case of early exit. The model refers to either a medium or long-term investment, in which the investor could exit before the term; thus, at this purpose, the expected minimum wealth is fixed at each rebalancing date. Let m be the number of rebalancing dates, n the number of assets and (w ) the portion of the investor’s total wealth invested in security i at time j; the vectorwith defines the portfolio. At each rebalancing date the wealth is thus given by where 1 is the column vector with n elements, all equal to one. We assume that is the starting date. We denote with r ∈ ℜ the expected return vector and with C ∈ ℜ the covariance matrix, that we assume positive definite. The model is stated as follows:where ξ init is the initial wealth, is the vector of expected minimum wealth. The quadratic term in the objective function represents the portfolio risk. This is obtained by summing single-period variances. The successive terms are regularization ones introduced by fused lasso approach. The l 1 norm is well-known to promote sparsity. We apply it to w, weighted by τ 1 > 0, in order to have small portfolios. This improves the control on the investment and reduces holding cost. Moreover, we have observed that it penalizes negative components, thus resulting in a penalty on shorting [21], [22]. This turns out to be useful when short positions are not allowed, since one can obtain positive weights by properly tuning τ 1. The fusion term, that is the l 1 norm applied to the time variation Δw, has a smoothing effect on solution, preserving patterns of non-zero values across time. Its financial interpretation is a penalty on the portfolio turnover; it has then the effect of reducing transaction cost [4]. This term is weighted by τ 2 > 0. The values τ 1, τ 2 are referred to as the regularization parameters. Both equality and inequality constraints are imposed on wealth. The first wealth must be equal to the initial investment, as the first constraint establishes. Constraints from 2 to m state the self-financing property, that is, money is neither added to the portfolio nor withdrawn from it for j > 1. Thus, at the end of each period the wealth is given by the revaluation of the previous one. Inequality constraints state that a lower bound is defined on the portfolio wealth at the end of each subperiod, that is, at all rebalancing dates. Finally, the last component is the minimum wealth expected by the overall investment. It is given by the revaluation of the last weight sequence produced by the investment strategy on the last rebalancing date. For sake of notation simplicity, we formulate problem (3) in compact form. We introduce the matrix C ∈ ℜ as the m × m diagonal block matrix . The blocks are the covariance matrices estimated at the beginning of each subperiod. The matrix C is symmetric positive definite (SPD) with sparsity degree. Let be the discrete difference operator; it can be viewed as the upper bidiagonal block matrix, with blocks of dimension n × n, defined by:where I is the identity matrix of dimension n. L has sparsity degree. The m × m lower bidiagonal block equality constraint matrix A, with blocks of dimension 1 × n and sparsity degree, is defined as follows:Finally, let G be the inequality constraint matrix; it is a m × m upper bidiagonal block matrix with blocks of dimension 1 × n and sparsity degree whereThen, problem (3) admits the following compact formulation:where .

Split Bregman iteration

We solve (4) using the split Bregman scheme. Therefore, we first provide a short description of it. It is based on Bregman iteration method for convex optimization problems with equality constraintsThe main idea of Bregman iteration is to transform the problem into a sequence of easier ones which are constructed by adding a ”cost-to-move” term to the original objective function [14]. This term penalizes the distance between two iterates defining the Bregman distance at point u, as:where p ∈ ∂E(u) is a subgradient in the subdifferential of E. Note that if E is smooth, then (6) is the difference between E(v) and the first-order Taylor expansion of E at v. Bregman iteration applied to problem (5) produces the following scheme:with λ > 0. The split Bregman method was introduced in [2], where authors proposed to use an auxiliary variable d before applying Bregman iteration to solve problem (5). The introduction of d is aimed at replacing the original problem with an equivalent one in which the smooth and nonsmooth portions of objective function are separated. Then, a further constraint is added, which forces the equality between d and the nonsmooth term. In order to apply split Bregman method to problem (4) the first step is to reformulate it in terms of equality constraints only. We introduce the slack variable s ∈ ℜ in order to transform the inequality constraint in (4) into an equality one. We rewrite the minimization problem using the indicator function to incorporate the non-negativity constraint on the slack variable into the objective function:whereAccording to split Bregman method, we introduce the auxiliary variables d and z such that and . Problem (8) is then reformulated in the following way:Alternating split Bregman method proposes to minimize the function in (9) with respect to the variables w, s, d and z alternately. Note that this algorithm is equivalent to other well-known methods such as the Douglas-Rachford splitting and the alternating direction method of minimizers [14]. Due to linearity of constraints in (9), a simplified version of split Bregman iteration can be used. In this version, the Bregman vectors allow one to use the function E rather than its Bregman distance [25]. This leads to the following simplified alternating minimization algorithm:with quadratic function Q defined asClosed form solutions can be obtained for minimization with respect to s, d and z. Minimization with respect to d and z can be done efficiently using soft thresholding operator, defined as:where x is real vector and γ > 0, while the proximal mapping of the indicator function on a given set is the orthogonal projection operator onto the same set. Regarding the quadratic minimization with respect to w, we note that at each step k the optimal value can be obtained by solving the systemwithandWe observe that the coefficient matrix defined in (13) does not depend on the iteration and it is SPD and sparse, as shown in the next proposition. Matrix H defined in (13) is a SPD, m × m tridiagonal block matrix with blocks of dimension n. H is the sum of C and I, that are SPD, and of other matrices that are semi-positive definite; then the first statement follows. We analyze the structure of matrices in ( 13 ). L, A and G are bidiagonal block matrices with blocks of dimension 1 × n, then for each of them the product of its transpose by the matrix itself is a tridiagonal block matrix with blocks of dimension n × n. It follows that the sum has a block tridiagonal structure. □ In the next proposition we give bounds on the eigenvalues of H. Let λ(H) be the set of eigenvalues of matrix H; we have with We recall that if A and B are real, symmetric matrices, then has real eigenvalues, and the following inequalities hold: The matrix H is the sum of real and symmetrix matrices, then with The matrices A, G and L have rank smaller then N, so, for each one of them, the product between the matrix and its transpose is rank deficient. This completes the proof. □ Proposition 1 suggests to use sparse Cholesky factorization to solve the systems (12). Sparse direct methods are a combination of techniques from numerical linear algebra, graph theory, graph algorithms, permutations, and other topics in discrete mathematics. They exploit the sparsity of a matrix to solve problems much faster and using far less memory than if all the entries of a matrix were stored and took part in explicit computations (see [26], [27] and references therein). We note that the factorization is computed only once, while at each step two triangular linear systems are solved. The resulting method is outlined in Algorithm 1 .

Algorithm 1

Alternating Split Bregman for Portfolio Optimization.

Numerical experiments

In this section, we show some results of tests that we perform on real market data. Algorithm 1 is applied to several data sets generated using real-world price values provided in [28]. The datasets contain weekly return time series of assets belonging to several major stock markets across the world, cleaned from errors as much as possible. We simulate 10 years investment strategies, where the investor revises decisions once a year. In Table 1 we summarise information on the datasets.

Table 1

Some characteristics of the datasets.

	Data set	Label	# of assets	Time interval
1	Dow Jones Industrial	DowJones	28	Feb1990-Apr2016
2	NASDAQ 100	NASDAQ100	82	Nov2004-Apr2016
3	FTSE 100	FTSE100	83	Jul2002-Apr2016
4	S&P 500	SP500	442	Nov2004-Apr2016
5	NASDAQ Composite	NASDAQComp	1203	Feb2003-Apr2016
6	Fama and French 49	FF49	49	Jul1969-Jul2015

Some characteristics of the datasets. The problem requires an estimate of the covariance matrix. It is well known that the sample covariance matrix is affected by estimation error; this is particularly severe when the number of stocks, which is the dimension of the matrix, is large relative to the number of historical returns, that is the sample size. We thus apply the linear shrinkage estimator proposed in [29] to C. This method acts on the eigenvalues of C, reducing their dispersion by shrinking them towards their grand mean: as a consequence, it strongly improves the conditioning of the covariance matrix, as Table 2 shows. In this table we report the condition number of the sample matrix and the condition number of its estimator.

Table 2

Condition number of matrix C: effect of the shrinkage.

Data Set	no shrinkage	shrinkage
DowJones	2.1053E+04	2.1431E+03
NASDAQ100	7.0926E+20	1.0375E+04
FTSE100	2.0993E+21	2.4305E+04
SP500	2.4678E+23	3.6621E+05
NASDAQComp	1.2587E+23	7.6804E+05
FF49	3.5266E+07	6.3079E+04

Condition number of matrix C: effect of the shrinkage. We compare optimal portfolios produced by the investment strategy with two benchmarks. The first one is the naive strategy, that extends the classical 1/n [30] to the multi-period case. The investor splits the money evenly among available assets, at the beginning of the investment as well as at rebalancing dates. Thus, at each rebalancing date we set as expected minimum wealth the expected value produced by the recursive application of the 1/n allocation strategy:Moreover, for datasets in Table 1 also weekly return time series for the index are reported in [28]. So for this datasets the market index is also used as benchmark. In this case the expected minimum wealth is the one of the market index:where is the index return. We introduce six performance measures to evaluate the goodness of results: We assume that one unit of wealth is invested, so we fix . We set . The stopping criterion is based on constraint violation, so the algorithm stops when all constraints are satisfied within . In Algorithm 1 we set a lower bound on to guarantee the investment:In our experiments we set to preserve the initially invested amount. M1we estimate the risk reduction when the naive strategy is taken as benchmark, by means of the following quantity:where the numerator is the variance of the portfolio produced by the naive strategy, the denominator is the variance of the optimal portfolio produced by Algorithm 1; M2 the number of nonzero elements in the solution (percentage), that gives an estimation of holding cost; M3 the transaction costs. In order to evaluate them we define the matrix with:for and . Note that, as in [4], in order to discard variations not significant from the financial point of view, we neglet the differences below . The number of estimated transactions of the optimal strategy is therefore:We then relate T to the maximum value that it can assume, that is, to the number of transactions of the portfolio with full turnover, which is the size N of the portfolio: M4 the excess return (ER) calculated in comparison to the benchmark: M5 the Sharpe Ratio (SR) [31] is the ratio between the average of the expected return of the portfolio and its standard deviation. In the multi-period framework we compute the SR in the following way:where and M6 the Information Ratio (IR) [32] is the average excess return per unit of volatility in excess return. We compute it as:where and In Table 3 we report the condition number of H. We observe that in almost cases the conditioning of H is better with respect to C. This is motivated by the shift of the spectrum to the right, according to Proposition 2.

Table 3

Condition number of matrix H.

Data Set	H
DowJones	4.4016E+02
NASDAQ100	5.3936E+03
FTSE100	2.0430E+03
SP500	4.8893E+04
NASDAQComp	7.0823E+05
FF49	1.0054E+03

Condition number of matrix H. Regularization parameters τ 1 and τ 2 are chosen in the set ; the combined choice of them depends on the specific metric one focuses on. We start by analyzing the behaviour of the wealth produced by the optimum portfolio over time, and compare it with the benchmarks. Figs. 1 and 2 show the values of wealth at rebalanging dates on datasets FTSE 100 and Dow Jones, but similar results are observed also for the other datasets. The first row of figures refers to dataset FTSE 100, compared with the naive strategy. The second row of the figure refers to dataset DowJones, compared with the index return. In Fig. 1 parameter τ 2 has been set to while τ 1 values range in the set from left to right. According to the analysis in [4] for both the datasets we have that increasing τ 1 results in a reduction of the number of short positions, from 102 to 0 for FTSE 100 and from 30 to 0 for Dow Jones, and the number of transactions, from 15% to 9% for FTSE 100 and from the 23% to the 16% for Dow Jones. This is due to the reduced size of the portfolio, as result of the sparsification effect of regularization on portfolio wealth. Focusing on wealth, Fig. 1 shows that the excess return is penalized as τ 1 increases. Indeed, our model outperforms the benchmark, showing an expected wealth higher than the one of the index, in almost all cases, but the difference is higher for smaller values of τ 1, while the gain in terms of cost reduction is valuable. All experiments exhibit a breakdown in the benchmark at the fourth year of the simulation period. This is probably due to the fact that corresponds to year 2008, so the lower return can be ascribed to the Crisis. Thus, at this date the excess return of the optimal portfolio is higher, as effect of (18).

Fig. 1

Behaviour of the wealth over time. First row: FTSE 100 compared with the naive strategy. Second row: DowJones compared with the index return. In all cases τ1 ranges among (left), (center), (right).

Fig. 2

Behaviour of the wealth over time. First row: FTSE 100 compared with the naive strategy. Second row: DowJones compared with the index return. τ2 takes values (left), (center), (right).

Behaviour of the wealth over time. First row: FTSE 100 compared with the naive strategy. Second row: DowJones compared with the index return. In all cases τ1 ranges among (left), (center), (right). Behaviour of the wealth over time. First row: FTSE 100 compared with the naive strategy. Second row: DowJones compared with the index return. τ2 takes values (left), (center), (right). In Fig. 2 the behaviour of wealth over time is shown for τ 1 equal to and for τ 2 varying in . For both the datasets, we have, as expected, a reduction on the number of transactions, from 22% to 5% for FTSE 100 and from 47% to 10% for Dow Jones. Limitation on transactions results in a slight increase in risk, and thus higher return at intermediate periods is observed. As already pointed out, financial targets of the investment and market conditions drive the choice of the regularization parameters. In the following tests τ 1 is chosen as the smallest number that guarantees the minimum number of short positions. This is an important issue as in some cases stock market regulators can impose bans on short-selling like in the two European countries hardest hit by the Covid-19 in March 2020, Italy and Spain. Once fixed τ 1, τ 2 is chosen so to have a good trade-off among the other performance metrics. In Table 4 we report on all the datasets the comparison with the index in terms of excess return and Information Ratio. However for completeness we report also the other metrics of the optimal portfolios, that are the amount of shorting, the percentage of active positions and of transactions, and the Sharpe Ratio. In all tests the optimal portfolio outperforms the benchmark. We achieve an excess return of about the 7% on the average, varying between the 2% and the 14%. The IR ranges between 0.301 and 0.453; SP500 exhibits the highest value of IR, coherently with the observed value of the excess return. On the other hand, FSTSE100 provides the highest excess return but its IR is slightly affected by the dispersion of excess returns.

Table 4

TEST	τ₁	τ₂	short	density	T	SR	ER	IR
DJ	10−2	10−3	0	37%	16%	0.859	2%	0.302
NASDAQ100	10−2	10−4	0	22%	16%	0.933	2%	0.301
FTSE100	10−2	10−3	0	16%	6%	0.460	14%	0.301
SP500	10−3	10−3	4	6%	2%	0.723	13%	0.453
NASDAQComp	10−2	10−4	0	5%	2%	0.581	6%	0.301

Comparison with the index. Columns contain in order: the test case label, parameters τ1 and τ2, the number of short positions, the percentage of active positions, the percentage of transactions, the Sharpe Ratio, the excess return and the Information Ratio. In Table 5 we show comparisons with the naive strategy. In this case we furthermore report the value of RR defined in M1. The reduction of transaction costs is about the 90% on the average. Note that the naive portfolio is a full-turnover one, thus the value of T represents the percentage of transactions made by the optimal strategy with respect to the benchmark. We also observe an higher excess return, varying between the 8% and the 17%, with an average value of 12%. The IR ranges between 0.302 and 0.440; DJ exhibits the highest value of IR, while the excess return is maximum for SP500 and FF49. We note that the SR values are greater than the corresponding values obtained when the benchmark is the index. Finally, our strategy produces optimal portfolios that outperform the benchmark in terms of final wealth with a lower risk, as shown by RR, that varies between 1.510 to 4.962.

Table 5

Comparison with the naive strategy. Columns contain in order: test case label, parameters τ1 and τ2, the number of short positions, the percentage of active positions, the percentage of transactions, the Sharpe Ratio, the excess return, the Information Ratio and the risk reduction factor.

TEST	τ₁	τ₂	density	T	SR	ER	IR	RR
DJ	10−2	10−2	46%	16%	1.032	8%	0.440	1.510
NASDAQ100	10−2	10−2	20%	7%	1.278	9%	0.407	1.932
FTSE100	10−2	10−3	23%	9%	0.609	10%	0.302	1.701
SP500	10−2	10−3	6%	2%	0.821	17%	0.302	3.184
NASDAQComp	10−2	10−2	4%	1%	0.740	13%	0.324	4.962
FF49	10−2	10−4	17%	14%	0.806	16%	0.302	2.273

Conclusion

In this work we use split Bregman method for the problem of defining an optimal long-term investment strategy, where the investor can exit the investment before maturity without severe loss. We propose a model in a multi-period Markowitz framework, which extends the fused lasso model proposed in [4]. The inequality constraints on expected minimum wealth at each rebalancing date are introduced to guarantee the investment throughout the period, especially during the unforeseen events such as market crisis. Alternating Split Bregman produces an algorithm that yields subproblems that are solved by fast methods. Numerical comparisons with respect different benchmarks on real databeses show the its effectiveness.

1 in total

1. Real-world datasets for portfolio selection and solutions of some stochastic dominance portfolio models.

Authors: Renato Bruni; Francesco Cesarone; Andrea Scozzari; Fabio Tardella
Journal: Data Brief Date: 2016-06-28

1 in total

1. Fast Terahertz Imaging Model Based on Group Sparsity and Nonlocal Self-Similarity.

Authors: Xiaozhen Ren; Yanwen Bai; Yingying Niu; Yuying Jiang
Journal: Micromachines (Basel) Date: 2022-01-08 Impact factor: 2.891

1 in total