Is “cherry picking” a factor in hedge fund performance?

By David H Bailey, on September 25th, 2014

Challenging times for hedge funds

Recently attention has been drawn to the fact that the advantage enjoyed by hedge funds over more conventional investment vehicles has been eroding. For example, the annualized “excess return” of the HFRI equity hedge fund index (adjusted for certain factors, 60 month rolling average) has declined from approximately 15% in 2000 to less than 2% in 2010, and actually has been negative over the past two years. In particular, the average year-to-date hedge fund return (as of September 2014) is only 2%, compared to the 7.27% rise in the S&P500 index. Similarly, only 23% of large-cap core mutual funds have outperformed the S&P500 year-to-date (compared with an average of 37% of such funds for the past ten years).

According to study by the Wilshire Trust Universe Comparison Service, average public-pension gains from hedge funds were 3.6% for the three years ending 31 March 2014, versus 10.6% from stocks and 5.7% from bond-based investments. The failure of hedge funds to consistently “beat” the market (in the sense of long-term performance exceeding that of a broad index such as the S&P500) has led some large pension funds to question their hedge fund investments. For example, on 15 September 2014 the California Public Employees Retirement System (CALPERS) announced that it will liquidate 100% of its over $4 billion investment in hedge funds during the next year.

SEC examines hedge fund practices

In the latest development, the U.S. Securities and Exchange Commission (SEC) announced on 22 September 2014 their initial findings that some hedge funds have deliberately overstated their performance, and have failed to follow their own in-house protocols for evaluating investments. For example, the SEC found that some funds had engaged in flip-flopping, namely boosting valuations of investments by changing the way they measure holdings several times per year. In other cases, funds chose the measure with the highest value (among several different possible measures).

Andrew Bowden of the SEC also noted marketing and advertising issues, with some firms possibly misleading clients on their past performance by “cherry-picking historical results.”

Cherry picking and “selection bias” effects

David Hand’s new book The Improbability Principle lists “cherry-picking results” as one form of “selection bias” (also known as “survivor bias”), which can lead to highly inaccurate conclusions.

Selection bias is now thought to be at the root of recent difficulties in the pharmaceutical industry, where products that look promising, based on initial clinical tests and trials, later disappoint in real-world implementation. The success rates for new drug development projects in Phase II trials have recently dropped from 28% to 18%. The principal reason for these disappointments is now thought to be the fact that pharmaceutical firms, intentionally or not, typically only publish the results of successful trials (“cherry picking”), thus introducing a fundamental selection bias into the results.

As an illustration of the selection bias effect, note that if someone rolls a set of ten six-sided dice, the probability of seeing all sixes (or any other particular pre-specified combination) is approximately 1.65 x 10^-8, or in other words, roughly one chance in 60 million. But as she rolls the ten dice together over and over again, her chances of getting all sixes increase, until, after tens of millions of trials, she is almost guaranteed to see a roll with ten sixes. If one had seen only the one all-sixes result of this experiment, one might have been justified in concluding that the dice are “loaded,” and that future rolls are likely to produce disproportionate numbers of sixes, but this is not the case. Rolling ten dice 60,000,000 times is perhaps not a practical real-world scenario, but using a computer to explore 60,000,000 variations of an investment strategy is a relatively minor task, something that can be done in a few seconds or minutes on a present-day system. In consequence, such computer “experiments” are vastly more likely to result in selection bias and statistical overfitting errors.

As another example, suppose that a financial advisor sends out 10,240 (= 10 x 2¹⁰) letters to prospective clients, with half predicting that some stock or other security would go up in market value, and half predicting that it would go down. One month later, the advisor sends out a set of 5,120 letters, only to those who were earlier sent the correct prediction, again with half predicting some security will go up and half predicting it will go down. After ten repetitions of this process, the final ten recipients, were they not aware of the many letters to other clients, doubtless would be truly impressed at the advisor’s remarkable prescience. The set of ten correct predictions sent to these recipients can be thought of as the equivalent of the string of ten consecutive sixes in the first example above, and to highlight such results but ignore or hide the others is a form of selection bias.

Backtest overfitting

One instance of selection bias in the financial world is backtest overfitting (i.e., the statistical overfitting of historical market data), which we have analyzed in detail in two recent papers (Here and Here) and also illustrate via an online simulator. The most common scenario is that many variations of a proposed investment strategy are tested on a particular backtest dataset, but only the final optimal variation is selected or mentioned to others. It is currently thought that backtest overfitting is a leading reason why so many market strategies that look good on paper later fall flat when actually fielded — the optimal strategy was “selected” for its seemingly excellent performance on a backtest dataset, but it has little or no fundamental predictive power and is largely ineffective in practice.

Summary

We may chuckle at such examples of selection bias, but they drive home why it is so important, especially in the present era of high-performance computing technology, to be vigilant in ensuring that any analyses of financial products, strategies or performance utilize very sound and carefully vetted statistical methods.

Keep in mind that we live in an era of vastly more powerful computer systems than were available even a few years ago. For example, a 2014-era Apple MacPro workstation, which features a peak performance 7 Tflop/s (i.e., 7 trillion floating-point operations per second), is roughly on a par with the world’s most powerful supercomputer from just 15 years earlier (see the latest Top 500 list). Such systems make it vastly easier to detect and analyze interesting market phenomena, using very large datasets. But they also make it vastly easier to search through thousands or millions of variations of a proposed market strategy, almost ensuring that the optimal strategy finally selected will be statistically overfit.

What can be done? To begin with, the time has come to not accept any claim of financial performance unless full details of the development and analysis methodology are disclosed. This may be an inconvenience, but it is far better to be safe than sorry.

Mathematical Investor