You are here: Projects » Portfolio Selection

The portfolio selection problem consists in selecting a set of assets, and the share invested in each asset, that provides the investor a minimum required return and minimizes the risk. One of the main contributions in this problem is the seminal work by Markowitz [Markowitz, 1952], who first introduced the so-called mean-variance model, which takes the variance of the portfolio returns as the measure of investor's risk. According to Markowitz, the portfolio selection problem can be formulated as an optimization problem over real-valued variables with a quadratic objective function and linear constraints.

In this page we collect some benchmark instances for the Portfolio Selection problem taken from real-world stock markets data. The instances were built employing the Yahoo Finance website as a data source for historical stock values, which have been further processed by some PHP scripts.

We first start with introducing the precise problem statement, then we describe the file format of the problem instances and finally we provide the links to download the data files.

Following [Markowitz, 1952], we are given a set of `n` assets, `A` = {`a _{1}`, ... ,

A portfolio is a vector of real values `X` = {`x _{1}`, ...,

min F_{R}(X) = Σ_{i = 1..n} Σ_{j = 1..n} `σ`_{ij} `x`_{i} `x`_{j}

subject to
Σ_{i = 1..n }`r`_{i} `x`_{i} ≥ `R`

Σ_{i = 1..n }`x`_{i} = 1

`x`_{i} ≥ 0 ∀ i = 1..`n`

Σ

Since the minimum required return `R` can be considered as a parameter of the problem, by solving the problem as a function of `R`, ranging over a finite and discretized domain, we obtain the so-called *unconstrained (Pareto) efficient frontier* (UEF), that gives for each return the minimum associated risk.

In the problem formulation it is required that all the asset shares have to be non-negative. Even though this requirement is a common assumption behind theoretical approaches, it not enforced in real-markets, where the presence of short positions (i.e., assets with negative shares corresponding to speculations on falling prices) is greatly intertwined to long positions (i.e., assets with positive shares).

In order to allow short sellings we modify the model by removing the inequality `x _{i}`≥ 0 and adding an additional asset

Σ

The first constraint requires that, in order to warrant the investor position in case the price of the sold asset will rise instead of falling, a collateral risk-free investment is required. The investment in the risk-free asset must be no less than a proportion γ of the overall sum of the short positions.

The second constraint is imposed by law (US regulation T) to limit the amount of investments in the short positions.

For determining the UEF in the case of short sales we set the parameter γ to 0, so that no collateral is strictly required (the most extreme situation). As for the return of the collateral asset we fixed it to the return of the T-bond at the beginning of the period, having the same maturity has the stock-prices data taken into account.

Notice that according to the modifications and the introduction of the additional asset `n` + 1 the constraints of the original formulation now reads:

Σ_{i = 1..n + 1}`r`_{i} `x`_{i} ≥ `R`

Σ_{i= 1..n + 1}`x`_{i}= 1

`x`_{n + 1} ≥ 0

Notice also that the risk-free asset can be employed also in the UEF of the original problem formulation. In this case the share of the risk-free asset correponds to the decision of keeping part of the wealth uninvested.
Σ

The data needed to fully describe an instance of the problem are:

`n`the overall number of assets;`r`the expected return of each asset, including the expected return_{i}`r`of the risk-free (collateral) asset;_{n + 1}`σ`the covariance matrix._{ij}

# Instance created from Yahoo Finance on 02-08-2007 # Country: Italy # Index: MIBTEL # Index_file: ./indexes/it_mibtel.index # Period: 2001-2006 Cash_return: 0.002112 Number_of_assets: 167 Asset: 1 0.0088812826376279 ACE.MI Asset: 2 -0.0147325872017824 ACO.MI Asset: 3 0.0065871978321356 ACS.MI Asset: 4 0.0109624773142050 AE.MI Asset: 5 0.0086946666486191 AFI.MI ... Covariance: 1 1 0.0075622080207523 Covariance: 1 2 0.0051116935557743 Covariance: 1 3 0.0038950532965118 Covariance: 1 4 0.0025220967764908 Covariance: 1 5 0.0032403279431379 ... Covariance: 166 167 0.0018085655370221 Covariance: 167 167 0.0034447930439915The format is as follows:

- The first lines starting with the character
`#`

are comments and could be ignored. They are present only to give a human-readable description of the instance (e.g., the index employed and the temporal period used to build the instance). - The line starting with the string
`Cash_return:`

cointains the value of the T-bond/T-bill return with the same maturity as the investement horizon. - The line starting with the string
`Number_of_assets:`

, contains the value of`n`, i.e., the overall number of assets (167 in the present case). - There are exactly
`n`asset lines starting with the string`Asset:`

. Each asset line is composed by the asset index`i`, the asset return`r`and the Yahoo Finance symbol of that asset (which can be ignored)._{i} - There are
`n`(`n`- 1) / 2 covariance lines starting with the string`Covariance:`

. Each covariance line is composed by the two asset indexes`i`and`j`and the covariance value`σ`. Since the matrix_{ij}`σ`is symmetric, the line corresponds also to the value of`σ`._{ji}

The file format slightly resembles the one of the 5 Beasley's OR-Library instances (available from the author's website) that were the only publicly available instances for the problem, at the time of creation of this repository. The main difference between our format and the OR-Library one resides in the fact that Beasley's instances provide the variance of each asset and the correlation matrix, which form an indirect way of computing the covariance values. We decided to directly provide the covariance matrix instead.

In order to make possible the comparison of the results against a lower bound, we plan also to provide a discretization of the Efficient Frontier for the unconstrained problem (UEF). In this case the file format is simply the sequence of *expected return* and *variance* values, one on each line as in the following example:

# Efficient frontier computed for file it_mibtel-2001-2006-m.sd # Number of samples: 100 # Maximum return: 0.03175851315349978 (for risk 0.02542564323870561) # Minimum return: 0.002118880183120193 (for risk 8.682121560859373e-10) # Return Risk Number_of_assets Running_time Status 0.03175851315349978 0.02542564323870561 1 0.1622598171234131 Optimal 0.03145912292147575 0.01837203123138026 3 0.2409031391143799 Optimal 0.03115973268945171 0.01330801327560206 3 0.2350549697875977 Optimal 0.03086034245742768 0.009768678805943964 3 0.217911958694458 Optimal 0.03056095222540364 0.00775237509067065 4 0.2399439811706543 OptimalThe lines starting with the character

`#`

are comments that could be ignored. Each point of the frontier is reported on a single line. The values are to be interpreted as follows: return, variance, number of assets with a share greater than 10The following problem instances have been built employing the Yahoo Finance website as a data source for historical stock values. Each file corresponds to the assets that were part of the given stock index on August, 1st 2007. The data collected were montly prices in the reference period and stocks with missing values were removed. Since the composition of the indexes could differ throughout years, we include only the stock data for instances with more than 30 stable assets. The unconstrained efficient frontier for the basic problem and the short-sellings one for 100 points have been computed using IBM ILOG CPLEX 12.2.

These instances have the same content as the original OR-Library files, updated to the new format and augmented with the return of the risk-free collateral asset. The unconstrained efficient frontiers for the two formulations have been recomputed as for the previous instances in order to include the risk-free asset.

Instance File | Country | Index | Period | n | Efficient Frontier File (basic formulation) | Efficient Frontier File (short-selling formulation) |
---|---|---|---|---|---|---|

port1.sd | Hong Kong | Hang Seng | 1992-1997 | 31 | port1.efn | port1.ssefn |

port2.sd | Germany | DAX 100 | 1992-1997 | 85 | port2.efn | port2.ssefn |

port3.sd | UK | FTSE 100 | 1992-1997 | 89 | port3.efn | port3.ssefn |

port4.sd | USA | S&P 100 | 1992-1997 | 98 | port4.efn | port4.ssefn |

port5.sd | Japan | Nikkei 225 | 1992-1997 | 225 | port5.efn | port5.ssefn |

Idea & Design by idea arts