Literature DB >> 27508232

Real-world datasets for portfolio selection and solutions of some stochastic dominance portfolio models.

Renato Bruni1, Francesco Cesarone2, Andrea Scozzari3, Fabio Tardella4.   

Abstract

A large number of portfolio selection models have appeared in the literature since the pioneering work of Markowitz. However, even when computational and empirical results are described, they are often hard to replicate and compare due to the unavailability of the datasets used in the experiments. We provide here several datasets for portfolio selection generated using real-world price values from several major stock markets. The datasets contain weekly return values, adjusted for dividends and for stock splits, which are cleaned from errors as much as possible. The datasets are available in different formats, and can be used as benchmarks for testing the performances of portfolio selection models and for comparing the efficiency of the algorithms used to solve them. We also provide, for these datasets, the portfolios obtained by several selection strategies based on Stochastic Dominance models (see "On Exact and Approximate Stochastic Dominance Strategies for Portfolio Selection" (Bruni et al. [2])). We believe that testing portfolio models on publicly available datasets greatly simplifies the comparison of the different portfolio selection strategies.

Entities:  

Year:  2016        PMID: 27508232      PMCID: PMC4959918          DOI: 10.1016/j.dib.2016.06.031

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table Value of the data The datasets provided here can be used as benchmarks by researchers willing to implement and to compare portfolio selection models on publicly available data. If different researchers use the same publicly available data, the comparison of different approaches would be more easy and fair. The data are filtered to remove possible errors in the original source. This allows researchers to perform more accurate and realistic simulations and evaluations. For our datasets we also provide the solutions to several portfolio selection models. Such solutions can be used by other researchers to compare the efficiency of their algorithms and the quality of their solutions. Availability of data and solutions can stimulate contacts among researchers working in this area for future collaborations and projects.

Data

We provide weekly returns time series for assets and indexes belonging to several major stock markets across the world. Weekly returns data are computed from prices values obtained from Thomson Reuters Datastream (http://financial.thomsonreuters.com/) and from daily returns obtained from Fama & French Data Library (http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html). The data are filtered to check and to correct missing or inaccurate values. The data provided can be used as input for several types of portfolio selection models to compare on both efficiency and performance (for references on portfolio selection approaches see, e.g., [3]). For the above datasets, we also include as benchmarks the portfolios obtained by using several selection strategies based on both exact and approximate Stochastic Dominance models (described in [2]).

Experimental design, materials and methods

Asset allocation aims at selecting a portfolio over available assets in an investment universe according to specific choice criteria under uncertainty. More precisely, we must decide how much of each asset should be purchased in the selected portfolio. The portfolio is denoted by , where is the fraction of the given capital invested in asset . Let denote the price of asset at time , observed for m+1 time periods, i.e., . The linear return of asset at time iswhere . Denoting by the value of the benchmark (e.g., the Market Index) at time , the benchmark linear returns arewhere . The portfolio linear return at time is All the datasets listed in the following Table 1 contain linear return values for each of the N assets contained in the market, together with the linear returns of the benchmark index, computed as described above.
Table 1

Weekly returns datasets provided.

Dataset Name# of assets (N)|T|Time intervalCountryDescription# of rebalancing (nreb)
1DowJones281363Feb 1990-Apr 2016USADow Jones Industrial Average110
2NASDAQ10082596Nov 2004-Apr 2016USANASDAQ 10046
3FTSE10083717Jul 2002-Apr 2016UKFTSE 10056
4SP500442595Nov 2004-Apr 2016USAS&P 50046
5NASDAQComp1203685Feb 2003-Apr 2016USANASDAQ Composite53
6FF49Industries492325Jul 1969-Jul 2015USAFama and French 49 Industry190
Datasets 1–5 consist of weekly linear returns computed on daily price data, adjusted for dividends and stock splits, obtained from Thomson Reuters Datastream. The selected benchmark is the market index. Stocks with less than ten years of observations were disregarded, thus obtaining a reasonable tradeoff between the number of assets (N) and of observations (|T|). Furthermore, when necessary, the assets prices are filtered to check and to correct inaccurate data. Data cleaning is indeed an important issue for similar data (see, e.g., [4] for references on this widespread problem). Dataset 6 is derived from the Fama and French 49 Industry portfolios, available from the Fama & French Data Library, which contains daily returns from July 1926 to July 2015. Since there are many data missing, especially before July 1969, we choose a subsample of periods where all the daily returns of the 49 industries are available, namely from July 1969 to July 2015. Furthermore, to standardize the frequencies of all data sets we extract weekly returns by cumulating daily returns in groups of five as follows: Since no market index is publicly available for the Fama and French 49 Industry portfolios, in this case we use the Equally-Weighted portfolio as a benchmark index. In addition to the returns datasets, we also make available the composition (weights) and the out-of-sample returns of the portfolios obtained, for all datasets and for several in-sample periods, with the models listed in Table 2 and fully described in the companion paper [2].
Table 2

Portfolio Selection models applied to the datasets.

Model NameDescription
CZeSDCumulative Zero-order epsilon Stochastic Dominance (see [1], [2])
RMZ_SSDRoman-Mitra-Zviarovich Second-Order Stochastic Dominance (see [9])
LR_ASSDLizyayev-Ruszczynski approximate Second-Order Stochastic Dominance (see [5])
L_SSDLuedtke Second-Order Stochastic Dominance (see [6])
KP_SSDPost-Kopa Second-Order Stochastic Dominance (see [8])
MeanVarMarkowitz Mean-Variance (see [7])
For each dataset and for each model, we compute the solutions using a rolling in-sample window of 52 returns observations. We initially set the in-sample window on the first 52 time periods, we select the portfolio by solving the model, and we evaluate the performance of the selected portfolio on the following 12 (out-of-sample) periods. Next, we update the in-sample window, with the inclusion of the previous 12 out-of-sample periods and the exclusion of the first 12 periods of the previous in-sample window. We then rebalance the portfolio by solving the model again, and repeat until the end of the dataset (see Fig. 1).
Fig. 1

Scheme of the rolling time window used in the analysis.

Following the notation of Table 1, the data provided with this article are organized as in Fig. 2 and labeled as follows:
Fig. 2

Structure of the database.

Dataset.mat: matlab workspace containing the X N returns matrix (Assets_Returns) and the X 1 vector of Index returns (Index_Returns) for the Dataset. Dataset.xlsx: excel file containing the X N returns matrix in the sheet Assets_Returns and the X 1 vector of Index returns in the sheet Index_Returns for the Dataset. OptPortfolios_Model_Dataset.txt: matrix (with size N X nreb) of portfolio weights obtained by the Model for the Dataset. OutofSamplePortReturns_Model_Dataset_List.txt: vector (with size -52 X 1) of the out-of-sample portfolio returns obtained by the Model for the Dataset. OutofSamplePortReturns_Model_Dataset_Matr.txt: same as above but in matlab matrix format. OutofSampleReturns_Index_Dataset.txt: vector (with size -52 X 1) of the out-of-sample benchmark Index returns for the Dataset.
Subject areaEconomics and Finance
More specific subject areaPortfolio selection, Portfolio optimization, Asset allocation
Type of dataTables, text files, excel files, matlab files, figures
How data was acquiredThomson Reuters Datastream, Fama & French Data Library
Data formatProcessed, filtered, analyzed
Experimental factorsWhen necessary, the assets prices are filtered to check and to correct missing or inaccurate data
Experimental featuresAll data sets provided consist of weekly assets returns readily usable in Portfolio Selection models
Data source locationN/A
Data accessibilityData is within this article
  3 in total

1.  Split Bregman iteration for multi-period mean variance portfolio optimization.

Authors:  Stefania Corsaro; Valentina De Simone; Zelda Marino
Journal:  Appl Math Comput       Date:  2020-10-05       Impact factor: 4.091

2.  Survey data on factors affecting negotiation of professional fees between Estate Valuers and their clients when the mortgage is financed by bank loan: A case study of mortgage valuations in Ikeja, Lagos State, Nigeria.

Authors:  Chukwuemeka O Iroham; Hilary I Okagbue; Olalekan A Ogunkoya; James D Owolabi
Journal:  Data Brief       Date:  2017-05-01

3.  Statistical analysis of bank deposits dataset.

Authors:  Pelumi E Oguntunde; Hilary I Okagbue; Patience I Adamu; Omoleye A Oguntunde; Sola J Oluwatunde; Abiodun A Opanuga
Journal:  Data Brief       Date:  2018-03-26
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.