Literature DB >> 31042709

A portfolio selection model based on the knapsack problem under uncertainty.

Fereshteh Vaezi¹, Seyed Jafar Sadjadi¹, Ahmad Makui¹.

Abstract

One of the primary concerns in investment planning is to determine the number of shares for asset with relatively high net value of share such as Berkshire Hathaway on Stock market. Traditional asset allocation methods like Markowitz theorem gives the solution as a percentage and this ratio may suggest allocation of half of a share on the market, which is impractical. Thus, it is necessary to propose a method to determine the number of shares for each asset. This paper presents a knapsack based portfolio selection model where the expected returns, prices, and budget are characterized by interval values. The study determines the priority and importance of each share in the proposed model by extracting the interval weights from an interval comparison matrix. The resulted model is converted into a parametric linear programming model in which the decision maker is able to determine the optimism threshold. Finally, a discrete firefly algorithm is designed to find the near optional solutions in large dimensions. The proposed study is implemented for some data from the US stock exchange.

Entities: Chemical Disease Species

Mesh：

Year: 2019 PMID： 31042709 PMCID： PMC6493714 DOI： 10.1371/journal.pone.0213652

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Markowitz [1,2] introduced a mean-variance model to select the optimal asset portfolio. The main task of the portfolio selection model is to allocate capital among different assets such that the risk and return are simultaneously optimized in the portfolio. Markowitz model is based on some assumptions that are rarely feasible in reality. Hence, there has been widespread efforts by many researchers to provide methods for analyzing stocks in financial markets and improving these methods in the world (see [3]). The standard mean-variance Markowitz model does not consider the real financial market constraints, including transaction costs [4,5], Cardinality Constraint [6,7], and multi-period Cardinality Constraint Mean-Variance (CCMV) [8-10]. Another problem in the real financial market is that decision-making processes often face with complexity and uncertainty. So, many approaches based on the uncertain conditions have been developed to consider the real condition of financial markets in portfolio selection models, including, a robust-based approach in [11], a scenario-based approach in [12,13], and fuzzy methods in [14,15]. Fuzzy sets was introduced by Zadeh [16] and interval-valued fuzzy sets was presented by Turksen [17]. During the past few years, a number of studies have examined the interval fuzzy uncertainty for the mean-variance Markowitz model [18-20]. The use of interval approaches converts the model into a nonlinear model, therefore, Liu [21] proposed two linear programming approaches to obtain the upper and lower bounds of the interval number of returns in the Markowitz model. In addition, a multi-period MV model with the interval returns, risk and turnover rates of risky assets was proposed by Liu et al. [22,23]. When all or part of the quantitative and accurate data is not available, some qualitative portfolio models under the fuzzy environment was proposed by Zhou and Xu [24]. On the other hand in the Markowitz model, the variance is considered as a risk measurement factor. Variance is known as a symmetric risk measurement factor and has been criticized by many researchers. Thus, other studies also used different risk measures in the portfolio optimization model and then examined the model under interval uncertainty. For example, mean-Value-at-Risk (VAR) portfolio optimization model [25], Mean-Variance-Skewness (MVS) portfolio selection model [26], and mean-semi-absolute deviation portfolio selection model [27-29] are some of the examples which have examined the portfolio optimization under interval uncertainty. Moreover, note that the structure of the decision variable in the mean-variance Markowitz model is a continuous variable. One of particular interest is the finite divisibility of the (stock) assets, i.e. the obligation to buy/sell only integer quantities of asset lots whose number is pre-established. In some cases, we need to properly allocate the number of shares to different assets. Thus, Li et al. [30] introduced a hybrid algorithm based on a convergent Lagrangian and a contour-domain cut method for solving a nonlinear integer portfolio optimization problem with concave transaction costs and cardinality constraint. Corazza and Favaretto [31] presented a hybrid algorithm based on the relaxation method and two rounding approaches for quadratic mixed-integer programming problem based on one riskless asset and n risky assets. Li and Tsai [32] presented a hybrid algorithm based on the dual Lagrangian relaxation and transformation approaches to examine a portfolio selection model with a nonlinear objective function, transaction costs, and integer variables. Bonami and Lejeune [33] proposed a non-linear branch-and-bound algorithm based on a stochastic branching and an integrated dynamic branching for portfolio optimization with integer variables. Anagnostopoulos and Mamanis [34] compared different multi-objective evolutionary algorithms to study a nonlinear mixed-integer three-objective problem with class and quantity limitations. Castro et al. [35] proposed a mathematical algorithm based on various test sets to solve a portfolio selection model with a nonlinear constraint and integer variable. Therefore, it is very important to provide portfolio selection models that can properly allocate the number of shares to different assets. Therefore, in this research, a portfolio selection model based on the knapsack problem with the risk preferences of investors, cardinality constraint, floor and ceiling constraint, and budget constraint is presented. Since it is far from the fact that information data are considered accurate and definitive, all information or part of them is handled using the fuzzy sets or interval values. In this paper, the interval-valued fuzzy sets are used to examine a portfolio optimization problem under uncertain conditions. Also, a discrete firefly algorithm is designed for the solution. A numerical example of the US stock exchange is given to illustrate the application of the proposed model and demonstrates the effectiveness of the designed algorithm for solving the proposed model. The structure of this paper is as follows. In section 2, some preliminary notions such as the interval numbers, interval comparison matrix, and interval weights are defined. In Section 3, the knapsack based portfolio optimization model considering the risk preferences of investors under interval uncertainty is presented. In Section 4, Discrete Firefly Algorithm (DFA) is presented for the proposed optimization model. A numerical example and sensitivity analysis are examined in Section 5. Finally, a brief summary of the paper, some conclusions, and some future studies are given in Section 6.

Preliminaries

In this section, a review of some necessary concepts is presented:

Interval numbers

Interval is represented as A = 〈m(A), w(A)〉 where: Alefeld and Herzberger [36,37] presented the mathematical operators (〈OP〉 ∈ {±,⊗}) between interval numbers and as follows: In addition, the multiplication operator (⊗) between number λ and interval number is defined as follows: Also, the division operator between interval numbers and is as follows: More related information of the interval numbers can be found in [38].

Interval comparison matrix and interval weights

Wang and Elhag [39] introduced a method for extracting interval weights from an interval comparison matrix. In this method, an interval comparison matrix is considered as follows: where l ≥ 0, l ≤ u, l = 1/u, and u = 1/l for i, j = 1,…,n; i ≠ j. A is divided into two Crisp matrices as follows: where A ≤ A ≤ A.Vector is the normalized interval weight vector, which is close to A in the sense that for i, j = 1,…,n; i ≠ j. Vector W is normalized [40] as follows: which can be rewritten as follows: Given that the interval comparison matrix A is equivalent to the weight vector W , A is written as follows: According to (Eq 4): . Therefore, the interval comparison matrix A is rewritten as follows: A is divided into two Crisp matrices as follows: It is easy to prove that AW = W + (n − 1)W and AW = W + (n − 1)W, Where and . Because the judgment of the decision maker may not be exact, the value of the deviations is defined as the following vectors: where E = (ε1,…ε), Γ = (γ1,…,γ) and I is a unit matrix of order n. Finally, the optimization model for extracting the interval weights from an interval comparison matrix is as follows:

Proposed optimization model

Assumptions and presentation

In order to conveniently describe the model, the following notations are used: n, the total number of stocks; k, the desired number of stocks which can be chosen in the portfolio; B, the total available budget; P, the price of stock i R, the return of stock i u, the upper bound of stock i l, the lower bound of stock i. x, the integer variable that represents the number of each stock; y, the binary variable indicating whether stock i is included in the portfolio or not. y = 1, if stock i is included in the portfolio, and y = 0 otherwise; i ∈ {1,2,…,n}.

Model formulation

The basic model is based on the knapsack problem, which is as follows: The target function (Eq (16)) maximizes the sum of the expected returns in the stock portfolio in which the sum of the weighted prices is less than or equal to the budget (Eq (17)). (Eq 18) states that each stock should be only between its lower bound (ly) and its upper bound (uy) in case the asset is chosen, i.e. y = 1. (Eq 19) states that the portfolio should contain a certain number of stocks (k). All financial managers who are active on the biggest financial houses normally consider only selected number of shares, e.g. the shares of 10–12 firms, which means we need to consider cardinality constraints. Most financial managers are interested in investment in any particular company as long as they invest a minimum amount. The main reason is because they have to study all related press news of the companies on a daily basis. In fact, we needs a big investment team to follow all news and reports and literally hundreds of firms are assigned in a portfolio but when there are limited number of firms in the investment, a relatively small group could manage the fund. On the other hand, it is an easy assignment to build an example where Markowitz theorem either focuses on single investment or gives us investment on more than 10–12 firms. In many cases, the results of Markowitz theorem is impractical since one has to invest a small amount of money on one share and big portion of investment needs to be devoted on another one (see [41]). Therefore, (Eq 19) is considered for the proposed portfolio optimization problem that is the cardinality constrains. Note that the structure of the decision variable in the knapsack-based portfolio optimization model (Eq (20)) differs from the decision variable in the mean-variance Markowitz model. In the mean-variance Markowitz model, the decision variable (the weight of each share (w)) is a continuous variable. However, in the knapsack-based portfolio optimization model, the decision variable (the number of each stock (x)) is an integer and/or binary variable [42,43]. In some cases, rounding the output of the Markowitz model may yield an infeasible solution or a very bad approximation to the optimal integer solution (see [31,32,35]). Therefore, this gap disrupts the optimization process and the portfolio selection model based on knapsack problem is proposed to properly allocate the number of shares to different assets. Also, the knapsack-based portfolio optimization model has an advantage over Markowitz theorem. For instance, suppose, there are three assets in a basket with the optimal weights of 0.2, 0.5 and 0.3 and we plan to invest one million dollars in these three shares whose market prices are 345,000$, 1000$ and 1400$, respectively. Consequently, the number of shares is 500, 214.285 and 0.579, respectively. As we can observe, 214.285 is not an integer number and 0.579 is less than one share. Therefore, according to Markowitz theorem; we need to purchase less than one share, which is obviously impractical. By using knapsack type optimization technique, we may be able to find the optimum asset allocation for especial cases with relatively large stock prices. In the following, with regard to the probable risks, uncertainty conditions, and the lack of accurate information on financial markets, fuzzy and interval programming techniques have been used. In this case, instead of using the exact values, the interval values are used such that the lowest and highest expected values of the parameter are placed in the lower and upper bounds of the range, respectively. To determine the estimated values of these limits, you can use the information of past years and consult with experts. On the other hand, the risk and uncertainty conditions are not only specific to the objective function coefficients, but also the technical coefficients and the values of the right-side of the constraints may also have these conditions. Therefore, the expected return of each stock , the price of each stock , and the budget are defined as the interval numbers. Therefore, the model is rewritten as follows: As can be seen, instead of using the exact values of the coefficients, their interval values are used; For example, in the objective function means that the target function coefficient of the i decision variable is possible to vary in the range to and to fluctuate. In the real world, decision-making processes often have complexity and uncertainty. Sometimes it is far from the fact that judgments of decision-makers are considered accurate and definitive. For this reason, all judgments or part of them are considered as interval values or fuzzy numbers. Interval judgments can examine uncertainty in judgments without the intervention of probability distribution functions in the weight extraction models of the pairwise comparison matrix and give closer results to the decision-making in uncertainty conditions. One of the major problems in extracting weights from the interval pairwise matrix is the problem of the incompatibility of matrices containing mental judgments. Therefore, a model used to derive weights from the interval comparison matrix should be able to protect against changes to these minor incompatibilities. Two types of model responses are defined in weight extraction models: point estimates [44] and interval estimates [39]. Point estimates make decision-making processes easy, but do not reflect uncertainty in responses as an interval. Therefore, a definite answer is obtained. Interval estimates show uncertainty in decision-making processes. Also, the length of the estimated intervals can be a criterion of uncertainty. On the other hand, a stable model should be consistent with the mental incompatibilities of the decision maker, which means that it can provide responses close to the responses extracted from the compatible pairwise comparison matrix. Therefore, in order to determine the importance and priorities of each stock in the stock market, the interval weights extracting from an interval comparison matrix are added to the model (M2). Therefore, the model is written as follows: Given these weights, the budget is also proportional to the scale. According to (Eq 2), the multiplication operator between interval values is defined as: In order to conveniently describe the model (M3), the following notations are used: According to Eqs (24–27), the model (M3) is rewritten as follows: Then, the model (M4) is converted into a crisp form using the technique presented by Sengupta [45]. Therefore, a solvable parametric linear programming model is as follows: where α (α ∈ [0,1]) is the data optimism threshold determined by the decision-maker. If the decision-maker is optimistic about the data, α is greater than 0.5 (50%) and otherwise less than 0.5. In fact, α is a controlling factor for data uncertainty determined by the decision-maker. Empirical studies show that it is usually better to choose α = 0.5 (see [46,47]). Finally, the features and advantages of the proposed optimization model are as follows: The knapsack based portfolio selection model can properly allocate the number of shares to different assets. In the proposed optimization model, the interval uncertainty of the parameters is considered in the objective functions and constraints simultaneously. In the other words, the expected returns, prices, and budget are characterized by interval values. The proposed optimization model considers the risk preferences of investors. In the other words, the importance and priority of each share are considered in the proposed optimization model, which are defined as interval values. In the proposed optimization model, the decision-maker is able to determine the optimism threshold. The proposed optimization model is very suitable for the value of a particular share becomes relatively large. Finally, the proposed optimization model based on the knapsack problem maximizes the returns considering the risk preferences of investors, budget constraint, cardinality constraint, floor and ceiling constraints under interval uncertainty.

Discrete firefly algorithm for portfolio selection

The complexity of the knapsack problem is NP-complete [48]. Therefore, in order to solve the proposed portfolio optimization model based on the knapsack problem with arbitrary inputs for large dimensions, we need an approximate algorithm that gives near-optimal solution. Also, a meta-heuristic method is used to solve a large-scale problem. Many studies compared the effectiveness of the firefly algorithm with other algorithms for different types of knapsack problem (see [49-51]). They concluded that firefly algorithm and its branches are a very powerful algorithm for solving the different types of knapsack problems for both static and dynamic environments. Yang [52] introduced firefly-inspired algorithm in 2008. Then in 2010, Sayadi et al. [53] presented Discrete Firefly Algorithm (DFA). Some related researches in portfolio optimization by firefly algorithm can be found in [54,55]. In general, there are three ideal rules for the development of firefly inspired algorithms that are: 1) All fireflies are considered unisex. This means that a firefly will be absorbed into another firefly, regardless of its sex. 2) The degree of attractiveness of a firefly is proportional to its brightness. If the distance between the two fireflies increases, the brightness decreases and consequently the attractiveness decrease. If none of the fireflies are brighter than the other, the firefly will move randomly. 3) The brightness of a firefly is determined by the objective function or affected by it. Therefore: The attractiveness function (β) is defined as follows: where r is the distance, β0 is the attractiveness at r = 0 and γ is a fixed light absorption coefficient. The distance between any two fireflies i and j at locations x and x is defined as follows: D is the dimension of the problem and x is the d − th component of the i − th firefly (x). The movement of a firefly i to another more attractive (brighter) firefly j is defined as follows: where φ is the random parameter, ξ is a vector of random numbers with Gaussian or uniform distributions in [0,1] and k is the iteration number. When a firefly i moves toward firefly j, the position of firefly i changes from a binary number to a real number. Therefore, the following sigmoid function is used to achieve binary position after the displacement. where is the probability of bit considered as 1. The operating steps of this algorithm are as follow: Algorithm 1. Pseudocode of DFA. Suppose that f(X) is the objective function of X = (x1,x2,…,x)T. Assign value for β0, φ, γ and MaxGeneration. Generate initial population of fireflies x for i = 1,2,…,n. Determine the light intensify I at x using f(x). while (t < MaxGeneration) do for i = 1: n all n fireflies do for i = 1: n all n fireflies do if (I < I) then Move firefly i towards j End if Vary attractiveness with distance r using exp(-γr2) Discrete the position of i − th firefly (31) Evaluate new solution (position of i − th firefly) and update light intensify I End for End for Rank the fireflies and find the current global best End while Show result and visualization.

Numerical example

In this section, a numerical example is presented to express the idea of the proposed optimization model. The case study includes the shares of the Dow Jones Industrial Average (DJIA) listed in New York Stock Exchange (NYSE) and covers weekly financial time series data over a period of five years from 18/10/2013 to 18/10/2018 (Appendix A). Also, the minimum and maximum number of each share is generated randomly and interval returns are calculated from the following formula: The upper bound of the return = ((Minimum close price during the period- Minimum value of investment during the period+ Minimum Dividend during the period)/ Minimum value of investment during the period)*100. The lower bound of the return = ((Maximum value of investment during the period- Maximum close price during the period+ Maximum Dividend during the period)/ Maximum close price during the period)*100 Table 1 shows the information on the minimum and maximum number, prices and returns of each share.

Table 1

Research data.

	symbol	Company name	P_i	P¯i	R¯i%	R_i%	l_i	u_i
S₁	BA	The Boeing Company	102.099	394.279	6.457	2.463	20	48
S₂	GE	General Electric Company	11.210	33.000	1.771	1.094	166	180
S₃	MMM	3M Company	120.709	259.769	2.250	0.966	15	60
S₄	PG	The Procter & Gamble Company	65.019	94.669	5.848	2.062	34	100
S₅	KO	The Coca-Cola Company	36.560	48.619	2.420	0.989	125	167
S₆	AAPL	Apple Inc.	70.507	233.470	3.606	2.886	50	60
S₇	AXP	American Express Company	50.270	111.769	4.975	1.135	33	42
S₈	UTX	United Technologies Corporation	83.389	144.149	3.305	1.948	75	80
S₉	CVX	Chevron Corporation	69.580	135.100	9.477	1.961	89	101
S₁₀	JNJ	Johnson & Johnson	81.790	148.320	8.296	1.262	35	45
S₁₁	NKE	NIKE, Inc.	34.924	86.040	2.416	0.946	111	167
S₁₂	UNH	UnitedHealth Group Incorporated	66.720	272.809	1.745	1.675	12	15
S₁₃	MSFT	Microsoft Corporation	33.570	116.18	4.776	1.949	45	63
S₁₄	IBM	International Business Machines Corporation	116.900	199.210	4.205	2.863	21	40
S₁₅	TRV	The Travelers Companies, Inc.	79.889	150.550	1.904	1.272	25	88
S₁₆	MRK	Merck & Co., Inc.	44.619	72.889	2.323	2.156	71	125
S₁₇	XOM	Exxon Mobil Corporation	66.550	104.720	8.608	1.646	100	120
S₁₈	WMT	Wal-Mart Stores, Inc.	56.299	109.980	1.947	1.045	47	50
S₁₉	GS	The Goldman Sachs Group, Inc.	138.199	275.309	2.968	1.972	55	90
S₂₀	CAT	Caterpillar Inc.	56.360	173.240	6.863	2.165	23	45
S₂₁	V	Visa Inc.	48.564	151.559	1.450	1.299	33	40
S₂₂	CSCO	Cisco Systems, Inc.	20.250	49.470	4.969	2.363	142	176
S₂₃	HD	The Home Depot, Inc.	75.480	215.429	2.531	1.916	33	66
S₂₄	JPM	JPMorgan Chase & Co.	50.070	119.330	5.368	1.770	52	72
S₂₅	PFE	Pfizer Inc.	27.510	45.810	2.761	2.012	142	167
S₂₆	MCD	McDonald’s Corporation	87.500	178.699	3.203	0.756	25	55
S₂₇	VZ	Verizon Communications Inc.	38.060	56.950	12.395	2.352	99	167
S₂₈	INTC	Intel Corporation	23.400	57.599	2.789	1.434	142	167
S₂₉	DIS	The Walt Disney Company	65.980	122.080	2.725	2.631	41	53
S₃₀	DWDP	DowDuPont Inc.	35.110	77.080	10.680	0.676	55	120

The first step in the proposed approach is to obtain an interval comparison matrix. Therefore, a detailed questionnaire is prepared which can be found in Appendix B. Questionnaire form was sent to three experts via email. In this study, an expert is someone who has the information and operational experience on the New York Stock Exchange. In addition, the interval compare n matrix of DJIA can be found in Appendix C. The second step in the proposed approach is to extract the interval weights from an interval comparison matrix. For this purpose, interval weights are obtained using the Wang’s approach [39] and are reported in Table 2.

Table 2

Extracting interval weights from the interval comparison matrix.

Weight of shares
WS1=[0.074,0.095]	WS2=[0.005,0.006]	WS3=[0.055,0.065]
WS4=[0.020,0.024]	WS5=[0.006,0.008]	WS6=[0.042,0.054]
WS7=[0.024,0.030]	WS8=[0.028,0.032]	WS9=[0.028,0.032]
WS10=[0.032,0.038]	WS11=[0.006,0.009]	WS12=[0.068,0.079]
WS13=[0.016,0.022]	WS14=[0.037,0.047]	WS15=[0.032,0.039]
WS16=[0.008,0.014]	WS17=[0.015,0.020]	WS18=[0.020,0.021]
WS19=[0.062,0.075]	WS20=[0.033,0.042]	WS21=[0.025,0.030]
WS22=[0.006,0.007]	WS23=[0.054,0.059]	WS24=[0.014,0.019]
WS25=[0.006,0.007]	WS26=[0.052,0.059]	WS27=[0.006,0.010]
WS28=[0.006,0.007]	WS29=[0.019,0.024]	WS30=[0.013,0.018]

In the following, the optimization model with the interval weights, the interval prices and the interval expected returns is solved in GAMS software. Table 3 shows the results of the exact solution of the proposed optimization model based on the different values of the uncertain parameter (α = {0,0.1.0.3,0.5,0.7,1}) in small dimensions.

Table 3

Exact solution.

GAMS solution
	α = 0	α = 0.1	α = 0.3	α = 0.5	α = 0.7	α = 1
K = 6	E(R) = 49.234% X₄ = 36; X₉ = 97; X₁₆ = 72; X₁₇ = 100; X₂₇ = 100; X₃₀ = 100.	E(R) = 50.679% X₄ = 46; X₉ = 100; X₁₆ = 72; X₁₇ = 100; X₂₇ = 100; X₃₀ = 100.	E(R) = 53.378% X₄ = 76; X₉ = 100; X₁₆ = 71; X₁₇ = 100; X₂₇ = 100; X₃₀ = 100.	E(R) = 55.782% X₄ = 100; X₉ = 100; X₁₆ = 80; X₁₇ = 100; X₂₇ = 100; X₃₀ = 100.	E(R) = 57.744% X₄ = 75; X₉ = 100; X₁₀ = 35; X₁₇ = 100; X₂₇ = 100; X₃₀ = 100.	E(R) = 60.813% X₄ = 99; X₉ = 100; X₁₀ = 40; X₁₇ = 100; X₂₇ = 100; X₃₀ = 100
K = 7	E(R) = 46.974% X₄ = 34; X₉ = 89; X₁₃ = 45; X₁₆ = 71;X₁₇ = 100; X₂₇ = 99;X₃₀ = 64.	E(R) = 49.254% X₄ = 34; X₉ = 89; X₁₃ = 45; X₁₆ = 71; X₁₇ = 100; X₂₇ = 100;X₃₀ = 86.	E(R) = 52.455% X₄ = 36; X₉ = 98;X₁₃ = 45; X₁₆ = 71;X₁₇ = 100; X₂₇ = 100;X₃₀ = 100.	E(R) = 54.981% X₄ = 58; X₉ = 100; X₁₃ = 45; X₁₆ = 71;X₁₇ = 100; X₂₇ = 100;X₃₀ = 100.	E(R) = 57.038% X₄ = 80; X₉ = 100;X₁₆ = 71; X₁₇ = 100;X₂₄ = 52; X₂₇ = 100;.X₃₀ = 100.	E(R) = 59.925% X₄ = 79; X₉ = 100;X₁₀ = 35; X₁₆ = 73;X₁₇ = 100; X₂₇ = 100; X₃₀ = 100.
K = 8	E(R) = 45.625% X₄ = 86; X₇ = 33;X₁₃ = 45; X₁₆ = 71;X₁₇ = 100; X₂₄ = 52;X₂₇ = 100; X₃₀ = 100.	E(R) = 46.860% X₄ = 98; X₇ = 33; X₁₃ = 46; X₁₆ = 73; X₁₇ = 100; X₂₄ = 52;X₂₇ = 100; X₃₀ = 100.	E(R) = 49.484% X₄ = 34; X₇ = 33;X₉ = 89; X₁₃ = 45;X₁₆ = 71; X₁₇ = 100;X₂₇ = 99; X₃₀ = 60.	E(R) = 53.303% X₄ = 34; X₇ = 33;X₉ = 89; X₁₃ = 45;X₁₆ = 71; X₁₇ = 100;X₂₇ = 99; X₃₀ = 98.	E(R) = 55.749% X₄ = 34; X₉ = 99;X₁₃ = 45; X₁₆ = 71;X₁₇ = 100; X₂₄ = 52;X₂₇ = 100;. X₃₀ = 100	E(R) = 58.682% X₄ = 68; X₇ = 33;X₉ = 99; X₁₆ = 71;X₁₇ = 100; X₂₄ = 52;X₂₇ = 100; X₃₀ = 100.
K = 9	E(R) = 43.762% X₄ = 34; X₇ = 33; X₁₃ = 45; X₁₆ = 71;X₁₇ = 100; X₂₀ = 23;X₂₄ = 52; X₂₇ = 100;.X₃₀ = 87.	E(R) = 45.431% X₄ = 38; X₇ = 33;X₁₃ = 45; X₁₆ = 71;X₁₇ = 100; X₂₀ = 20;X₂₄ = 52; X₂₇ = 100;X₃₀ = 100.	E(R) = 47.906% X₄ = 42; X₇ = 33;X₁₀ = 35; X₁₃ = 45;X₁₆ = 72; X₁₇ = 100;X₂₄ = 52; X₂₇ = 100;X₃₀ = 100.	E(R) = 50.061% X₄ = 66; X₇ = 33;X₁₀ = 35; X₁₃ = 45;X₁₆ = 71; X₁₇ = 100;X₂₄ = 52; X₂₇ = 100;X₃₀ = 100.	E(R) = 52.480% X₄ = 34; X₇ = 33; X₉ = 89; X₁₃ = 45; X₁₆ = 71; X₁₇ = 100; X₂₄ = 52; X₂₇ = 99;X₃₀ = 57.	E(R) = 57.077% X₄ = 36; X₇ = 33;X₉ = 89; X₁₃ = 45;X₁₆ = 72; X₁₇ = 100;X₂₄ = 52; X₂₇ = 100;X₃₀ = 100.
K = 10	E(R) = 36.058% X₄ = 34; X₇ = 33;X₁₃ = 45; X₁₆ = 71;X₁₈ = 47; X₂₀ = 23;X₂₄ = 52; X₂₇ = 99;X₂₉ = 41; X₃₀ = 71.	E(R) = 40.326% X₄ = 34; X₇ = 33;X₁₃ = 45; X₁₆ = 71;X₁₇ = 100; X₁₈ = 47;X₂₄ = 52; X₂₇ = 100;X₂₉ = 41; X₃₀ = 56.	E(R) = 44.145% X₄ = 34; X₇ = 33;X₁₃ = 45; X₁₆ = 71;X₁₇ = 100; X₁₈ = 47;X₂₄ = 52; X₂₇ = 100;X₂₉ = 41; X₃₀ = 94.	E(R) = 47.032% X₄ = 34; X₇ = 33;X₁₃ = 45; X₁₆ = 71;X₁₇ = 100; X₂₀ = 23;X₂₄ = 52; X₂₇ = 100;X₂₉ = 41; X₃₀ = 96.	E(R) = 49.702% X₄ = 36; X₇ = 33; X₁₀ = 35; X₁₃ = 45; X₁₆ = 71; X₁₇ = 100; X₂₄ = 52; X₂₇ = 100; X₂₉ = 41; X₃₀ = 100.	E(R) = 52.426% X₄ = 66; X₇ = 33;X₁₀ = 35; X₁₃ = 45; X₁₆ = 71; X₁₇ = 100; X₂₄ = 52; X₂₇ = 100; X₂₉ = 41; X₃₀ = 100.

As shown in Table 3, the solution of the model (M5) is not unique. This means that different values of α yield various solutions. When α is equal to zero, the expected return of the portfolio is minimum. When the level of data satisfaction (α) increases, the expected return of the portfolio will also increase. When α is equal to one, the expected return of the portfolio is maximum. For instance, the expected return of the portfolio with cardinality k = 6 lies in [0.49234 0.60813] based on different values of α (α ∈ [0,1]). The proposed portfolio selection model based on the knapsack problem is a mixed-integer optimization problem. Therefore, the exact methods such as the implementation of GAMS software package does not yield appropriate answers for large-size problems. In order to use the proposed model in the large scale, it is necessary to use the meta-heuristic algorithms. Therefore, Discrete Firefly Algorithm (DFA) is used. To select appropriate parameters, the Taguchi method is used for five parameters of DFA (Number of iterations, Number of fireflies, Randomization parameter, Absorption coefficient, Attractiveness). Finally, the selected parameters of this algorithm are attractiveness (β0 = 1), randomization parameter (φ = 0.2), absorption coefficient (γ = 1), number of iterations (MaxGeneration = 500), and number of fireflies (Population = 30). The results of discrete firefly algorithm in small dimensions are reported in Table 4. Then, the results of the exact method are compared with the results of DFA and are reported in Table 5:

Table 4

DFA solution.

t	Init. assign. no.	DFAsolution
K = 6		α = 0	α = 0.1	α = 0.3	α = 0.5	α = 0.7	α = 1
	1	50.971%	52.527%	53.048%	55.618%	55.289%	57.867%
	2	50.939%	50.895%	52.507%	54.336%	55.936%	57.923%
	3	49.451%	50.331%	52.352%	55.624%	56.744%	57.788%
	4	49.828%	52.194%	52.804%	55.419%	56.680%	58.604%
	5	50.987%	51.459%	53.029%	54.610%	55.638%	58.441%
Mean		50.435%	51.481%	52.748%	55.121%	56.057%	58.125%
SE Mean		0.003	0.004	0.001	0.002	0.002	0.001
S.D.		0.007	0.009	0.003	0.006	0.006	0.003
K = 7		α = 0	α = 0.1	α = 0.3	α = 0.5	α = 0.7	α = 1
	1	53.469%	54.087%	54.431%	56.192%	58.116%	57.534%
	2	52.830%	54.220%	55.039%	57.179%	58.503%	59.046%
	3	53.923%	53.005%	54.424%	57.781%	56.723%	58.635%
	4	53.580%	53.969%	53.859%	56.617%	57.617%	59.180%
	5	53.282%	53.656%	55.539%	54.825%	57.236%	59.041%
Mean		53.417%	53.787%	54.658%	56.519%	57.639%	58.687%
SE Mean		0.001	0.002	0.002	0.005	0.003	0.003
S.D.		0.004	0.004	0.006	0.011	0.007	0.006
K = 8		α = 0	α = 0.1	α = 0.3	α = 0.5	α = 0.7	α = 1
	1	53.160%	53.692%	54.438%	57.332%	59.896%	61.281%
	2	51.837%	56.064%	57.462%	57.531%	59.483%	62.910%
	3	52.511%	52.538%	55.797%	58.439%	59.061%	62.089%
	4	53.437%	53.928%	57.652%	58.694%	59.821%	60.880%
	5	53.823%	54.777%	54.765%	58.648%	58.930%	60.379%
Mean		52.954%	54.200%	56.023%	58.129%	59.438%	61.508%
SE Mean		0.003	0.005	0.006	0.002	0.001	0.004
S.D.		0.007	0.013	0.014	0.006	0.004	0.010
K = 9		α = 0	α = 0.1	α = 0.3	α = 0.5	α = 0.7	α = 1
	1	53.268%	56.171%	56.016%	57.217%	60.927%	61.928%
	2	54.554%	56.412%	58.045%	59.243%	59.971%	61.676%
	3	52.942%	54.454%	58.006%	57.593%	58.478%	62.929%
	4	52.866%	55.073%	57.444%	58.143%	59.191%	62.821%
	5	52.616%	54.091%	56.257%	58.585%	60.941%	62.020%
Mean		53.249%	55.240%	57.154%	58.156%	59.902%	62.275%
SE Mean		0.003	0.004	0.004	0.003	0.004	0.002
S.D.		0.007	0.010	0.009	0.008	0.010	0.005
K = 10		α = 0	α = 0.1	α = 0.3	α = 0.5	α = 0.7	α = 1
	1	54.988%	55.175%	56.752%	59.166%	60.544%	63.510%
	2	52.911%	52.341%	56.883%	58.985%	61.030%	62.007%
	3	53.477%	53.014%	57.139%	58.885%	59.531%	63.767%
	4	53.942%	54.696%	57.481%	59.369%	60.470%	63.151%
	5	51.117%	55.491%	55.189%	58.697%	61.679%	63.087%
Mean		53.287%	54.143%	56.689%	59.020%	60.651%	63.104%
SE Mean		0.006	0.006	0.003	0.001	0.003	0.003
S.D.		0.014	0.013	0.008	0.002	0.007	0.006

Table 5

Comparison of the exact solution with DFA solution.

		Mean solution of DFA	GAMS solution	S.D.	SE Mean	P-Value*
K = 6	α = 0	50.435%	49.234%	0.008	0.006	0.008
	α = 0.1	51.481%	50.679%	0.005	0.004	0.005
	α = 0.3	52.748%	53.378%	0.004	0.003	0.004
	α = 0.5	55.121%	55.782%	0.004	0.003	0.004
	α = 0.7	56.057%	57.744%	0.011	0.008	0.010
	α = 1	58.125%	60.813%	0.019	0.013	0.015
K = 7	α = 0	53.417%	46.974%	0.045	0.032	0.042
	α = 0.1	53.787%	49.254%	0.032	0.022	0.029
	α = 0.3	54.658%	52.455%	0.015	0.011	0.013
	α = 0.5	56.519%	54.811%	0.012	0.008	0.010
	α = 0.7	57.639%	57.038%	0.004	0.003	0.003
	α = 1	58.687%	59.925%	0.008	0.006	0.007
K = 8	α = 0	52.954%	45.652%	0.051	0.036	0.048
	α = 0.1	54.200%	46.860%	0.051	0.036	0.047
	α = 0.3	56.023%	49.489%	0.046	0.032	0.040
	α = 0.5	58.129%	53.303%	0.034	0.024	0.028
	α = 0.7	59.438%	55.749%	0.026	0.018	0.021
	α = 1	61.508%	58.682%	0.020	0.014	0.015
K = 9	α = 0	53.249%	43.762%	0.067	0.047	0.063
	α = 0.1	55.240%	45.431%	0.069	0.049	0.063
	α = 0.3	57.154%	47.906%	0.065	0.046	0.057
	α = 0.5	58.156%	50.061%	0.057	0.040	0.048
	α = 0.7	59.902%	52.480%	0.052	0.037	0.043
	α = 1	62.275%	57.077%	0.036	0.026	0.028
K = 10	α = 0	53.287%	36.058%	0.121	0.086	0.124
	α = 0.1	54.143%	40.326%	0.097	0.069	0.094
	α = 0.3	56.689%	44.145%	0.088	0.062	0.080
	α = 0.5	59.020%	47.032%	0.084	0.059	0.073
	α = 0.7	60.651%	49.702%	0.077	0.054	0.064
	α = 1	63.104%	52.426%	0.075	0.053	0.060

* denotes rejection of the hypothesis at the 0.01 level.

* denotes rejection of the hypothesis at the 0.01 level. As shown in Table 5, Standard Deviations (SD) and Standard Errors of the Mean (SE Mean) between GAMS solutions and DFA solutions lie in [0.004 0.121] and [0.003 0.086], respectively. These results are very reasonable and show that DFA solutions are very close to the exact solutions in small dimensions. Also, as shown in S1 Fig, as the data optimism threshold increases, the results of the two approaches converge more into each other. Therefore, for α greater than 0.5 (α≥0.5), the results of the DFA are more valid. Finally, these results indicate the validity of the used discrete firefly algorithm for the proposed portfolio selection model. Therefore, this algorithm can be used to solve the proposed optimization model in larger dimensions as well. Therefore, the sensitivity analysis and the set of answers in small and large dimensions are reported in Table 6:

Table 6

Set of answers in small and large dimension.

	D	DFA Solution						GAMS Solution
		α = 0	α = 0.1	α = 0.3	α = 0.5	α = 0.7	α = 1	α = 0	α = 0.1	α = 0.3	α = 0.5	α = 0.7	α = 1
Low -D	K = 6	50.435%	51.481%	52.748%	55.121%	56.057%	58.125%	49.234%	50.679%	53.378%	55.782%	57.744%	60.813%
	K = 7	53.417%	53.787%	54.658%	56.519%	57.639%	58.687%	46.974%	49.254%	52.455%	54.811%	57.038%	59.925%
	K = 8	52.954%	54.200%	56.023%	58.129%	59.438%	61.508%	45.652%	46.860%	49.489%	53.303%	55.749%	58.682%
	K = 9	53.249%	55.240%	57.154%	58.156%	59.902%	62.275%	36.058%	40.326%	44.145%	47.032%	49.702%	52.426%
	K = 10	53.287%	54.143%	56.689%	59.020%	60.651%	63.104%	36.058%	40.326%	44.145%	47.032%	49.702%	52.426%
High- D	K = 11	52.664%	55.066%	56.087%	59.616%	62.251%	63.694%	-	-	-	-	-	-
	K = 12	50.432%	52.244%	55.373%	58.155%	60.899%	63.898%	-	-	-	-	-	-
	K = 13	48.351%	49.467%	52.614%	55.060%	57.667%	61.688%	-	-	-	-	-	-
	K = 14	47.935%	48.951%	49.584%	52.767%	55.157%	57.829%	-	-	-	-	-	-
	K = 15	44.075%	45.247%	47.652%	49.948%	52.576%	55.531%	-	-	-	-	-	-

According to Table 6, the proposed discrete firefly algorithm is appropriate for solving the proposed portfolio selection model based on the knapsack problem with arbitrary inputs (different cardinalities) in large dimensions. Therefore, in the large-scale optimizations with a relatively large portfolio, the proposed discrete firefly algorithm is used.

Conclusions

In this paper, we have presented a portfolio selection model based on the knapsack problem under interval uncertainty, in which the expected returns, prices, and budget are characterized by interval values. We have also considered the importance and the priority of each share in the portfolio optimization model by extracting the interval weights from an interval comparison matrix. Finally, we have converted the proposed model into a crisp form and a solvable parametric linear programming model using the technique presented by Sengupta [45] in which the decision maker is able to determine the optimism threshold. The proposed optimization model has maximized the returns of the investment by taking into account the risk preferences of investors and budget, cardinality, floor and ceiling constraints under interval uncertainty of the parameters in the objective functions and constraints, simultaneously. The proposed model has the ability to determine the optimal portfolio of assets. In fact, the main contribution of this work is to present a portfolio selection model based on the knapsack problem, which eliminates the mentioned gaps in the Markowitz theorem. The portfolio selection model based on knapsack problem is proposed to properly allocate the number of shares to different assets. In addition, by using knapsack type optimization technique, we may be able to find the optimum asset allocation for especial cases with relatively large stock prices. A discrete firefly algorithm has also been designed to handle the complexity of the problem. A numerical example of the US stock exchange has been given to illustrate the application of the proposed model and demonstrated the effectiveness of the designed algorithm for solving the proposed model. Numerical results have shown that the proposed optimization model and the implementation of discrete firefly algorithm may yield promising results. Finally, the inclusion of other constraints in the proposed model such as class constraint and chance constraints and the inclusion of risk measurement factors such as variance and Value-at-Risk (VaR) will be an interesting extension of the proposed model and is regarded as the future scope of the study. Fig A. Comparison of the exact solution with DFA solution. Fig B. Comparison of the exact solution with DFA solution. Fig C. Comparison of the exact solution with DFA solution. Fig D. Comparison of the exact solution with DFA solution. Fig E. Comparison of the exact solution with DFA solution. (TIFF) Click here for additional data file.

Research data.