How backtest overfitting in finance leads to false discoveries

The present author, together with Marcos López de Prado, has just published the article How backtest overfitting in finance leads to false discoveries in Significance, a journal of the British Statistical Society. The published article is now available at the Significance (Wiley) website.

This article is condensed from the following manuscript, which is freely available from SSRN: Finance is Not Excused: Why Finance Should Not Flout Basic Principles of Statistics.

This paper introduces the problem of backtest overfitting in finance to the general reader who may be trained in the basics of statistics but not necessarily familiar with the application of statistics to finance or the dangers of backtest overfitting and selection bias under multiple testing.

Here is a brief summary of the key points of the article. See the published article or the SSRN manuscript for additional details:

  1. The field of finance and academic research in finance is particularly prone to false discoveries, for three reasons: (a) the chances of finding a statistically significant profitable strategy is very low due to intense competition; (b) true discoveries are often short-lived, as a result of the rapidly changing nature of financial systems; and (c) it is rarely possible to debunk a false claim through controlled experiments on new out-of-sample data. One would hope that, in such circumstances, researchers would be particularly careful when conducting statistical inference. Sadly, the opposite is true.
  2. The potential for backtest overfitting has grown enormously in recent years with the increased used of computer programs to search millions or even billions of parameter variations for a given model, fund or strategy. In many cases, those performing such searches, both in industry and academia, are unaware that the “optimal” settings produced in such searchers are almost certainly overfit. Sadly, academic journals in the field seldom require authors to disclose the extent of their computer searches.
  3. When designing a stock fund, for instance, even very simple and basic designs typically have hundreds, if not millions of parameter settings and design choices. Unless great care is taken, “optimal” designs based on backtests will be statistical mirages.
  4. Backtest overfitting is evident in the poor record of investment funds. Very few mutual funds or other financial instruments consistently generate gains above the overall market averages. As a single example, a 2019 report found among actively managed “U.S. large value” funds, only 8.3% beat the comparable passive index fund returns over a 10-year period.
  5. The record of market forecasters is consistently dismal. A 2016 report of market forecasters over a 17-yer period concluded that “the forecasts were least useful when they mattered most.” A study by the present author and colleagues of 68 market forecasters found an overall accuracy rating of only 48% — no different than chance.
  6. Tools are available to remedy these difficulties. The article, for instance, summarizes the “False strategy theorem” and the “Deflated Sharpe ratio,” two tools that can be used. Sadly, however, relatively few practitioners, either in the industry or in academia, regularly utilize such tools.
  7. Some in the finance field have dismissed such concerns, and have questioned the existence of a “replication crisis” in the field. We do not agree. Rather, the preponderance of poor out-of-sample performance points to a pervasive problem in the field.

For additional details, see the Significance paper or the SSRN manuscript.

Comments are closed.