How difficult is it to design a stock fund based on backtests?

By David H Bailey, on August 11th, 2017

Introduction

Over USD$2 trillion is held in exchange-traded equity funds, just in the U.S., with hundreds of new funds added each year. Strategies vary from simple index-tracking funds to funds that follow sophisticated strategies (e.g., “smart beta”) designed to yield impressive results, based on backtests. According to a Vanguard report, there is concern that many of these funds are not really independent of the indexes they follow, and in any event are not that different from the broad market.

So how hard is it to design a stock fund that will achieve a given performance profile? The present bloggers explored this question in a new technical paper.

Our approach

What we did was to demonstrate that given virtually ANY desired performance profile, one can devise a stock fund portfolio, constructed from S&P500 stocks, that achieves that profile, based on backtests. Do you want a steady 8% per annum growth, month after month, year after year? You can have it — or 10% or 12% or 15% (all based on backtests, over say a 15-year period). We even constructed portfolios that exhibit a stair-step or sinusoidal growth, just to demonstrate that any profile can be used.

The basic approach employed by our computer program that constructs these portfolios is as follows. Given a set of stocks and a desired performance profile, we employ techniques of optimization theory to find a set of weights that minimize the sum of squares deviation of the weighted portfolio time series from the target profile time series. The resulting mathematical formulation is in the form of a matrix equation, which can be solved using widely available linear algebra software. Mathematical details are given in the technical paper.

When this technique is implemented on real stock data, often at least some of the resulting weights are negative, meaning that those stocks are shorted in the resulting portfolio. While shorting is certainly a legitimate trading strategy, shorting exposes the portfolio to potentially large losses, so we also constructed portfolios subject to the contraint that each weight must be greater than zero.

Results

What we found was as follows. First of all, we definitely succeeded in achieving the target profile on in-sample (backtest) data — our standard portfolios (with positive and negative weights) matched the desired profile perfectly in-sample in every case.

So now for the $64,000 ($64 million?) question: How do these computer-constructed stock portfolios perform on new (out-of-sample) data?

Answer: decidedly mixed. In some cases, the fitted standard portfolios exceeded the target profile performance. But in most other cases, the portfolios had a very different fate, namely complete ruin — a catastrophic drop to zero, after which the portfolio is presumed to be liquidated. The portfolios designed under the constraint that all weights are positive avoided these catastrophic drops, but in those cases the performance is typically quite unlike the target portfolio, both in-sample and out-of-sample.

In ten test cases that we tried, seven of the standard-weight portfolios resulted in catastrophic losses (see, for example, the graphs below); only three achieved positive Sharpe ratios out-of-sample. Among the corresponding all-positive-weight portfolios, there were no catastrophic drops to zero, but the out-of-sample Sharpe ratios were all less than zero, and and the time series were poorly correlated with the target profiles.

Some examples of the standard-weight portfolio results are shown below. The orange curves are target profiles; blue is achieved performance; green is S&P500 for reference. The in-sample period is 1991 through 2005; the out-of-sample period is 2006 through 2015 (note that in each case, the blue curves coincide with the orange curves during the in-sample period). A full set of graphs and results is in our technical paper.

Conclusions

We have shown that it is relatively straightforward to produce a stock portfolio that achieves any desired performance profile, based on backtest (in-sample) data. However, the resulting portfolios tend to perform erratically on new (out-of-sample) data, certainly not following the target profile, and, in fact, resulting in complete ruin in many cases. Significantly less erratic results can be obtained by imposing constraints that restrict the portfolio to positive weights, but the resulting portfolios typically depart significantly from the target profile on both the in-sample and out-of-sample data.

The erratic performance observed in our results on out-of-sample data is a classic symptom of backtest overfitting. In fact, overfitting is unavoidable in this or any scheme that amounts to searching over a large set of strategies or fund weightings, and only implementing or reporting the final optimal scheme.

The same difficulty afflicts many other attempts to construct an investment strategy based solely on daily, weekly, monthly or yearly historical market data, such as by trying to discern patterns in stock market indexes by examination of charts (as is often done by technical analysts) or designing a portfolio that tracks a particular risk profile, as many smart beta ETFs attempt. Any underlying actionable information that might exist in such data has long been mined by highly sophisticated computerized algorithms operated by large quantitative funds and other organizations, using much more detailed data (minute-by-minute or even millisecond-by-millisecond records of many thousands of securities), who can afford the expertise and facilities to make such analyses profitable. Any lesser efforts, such as those described in our paper, are doomed to be statistically overfit, and if followed may well have disastrous consequences.

Full technical details are available here and in a forthcoming journal article.