[Editor’s note: This guest post is by Zachary David, an analyst who advises institutional clients on trading practices (website: http://zacharydavid.com). The article below is adapted from this blog.]
Spurious results are the norm
Having done a healthy share of paper replications over the past decade, and having been consistently disappointed when the models or techniques broke down on data shortly after (or even before) the authors’ sample periods, I would say that data misuse is a gigantic problem — spurious results are the norm. But also over those years, the granularity of market data available to researchers has become finer, and the ubiquity of machine learning software packages has placed predictive modeling in the hands of anyone with a pulse. As a result, more troubling than potential “false positives”, I have found papers so filled with coding errors and domain mistakes that the authors couldn’t have possibly found the existence of “a positive” in the first place (I hate neologisms but until you come up with something better these are called “impossitives”). So not only is it easier than ever to find spurious relationships, but there are several new classes of fatal errors that the current publishing and peer review system is poorly equipped to identify.
That said, this should also be clear: despite its pejorative connotations, data mining itself is not malicious and does not cause false positives. In data-driven trading or quantitative investment research, data mining is a big part of what you have to do. But successfully employing those techniques requires a lot of attention to the nuances of the domain, critical evaluation of your assumptions, and a commitment to exploring philosophies of statistics and science. When pairs trading was invented at Morgan Stanley in the early 1980’s, that came from a computer science graduate from Columbia who was calculating relative weights and correlations for groups of stocks sorted by industry sectors. In that same era, RenTech was getting started. Robert Frey, a former managing director, has said that their approach was “decidedly non-theoretical. Discipline is the key, not theory.” The founder, Jim Simons, tells an anecdote about sending people down to the NYFed building to copy historical interest rate information by hand that was kept in a written ledger. These people are perhaps the most successful data miners in all of finance.
Coding errors
An ever-increasing risk for catastrophic errors is when academics write code. A few years ago one of the most famous results in empirical economics was actually a spreadsheet error. More recently, I discovered numerous invalidating bugs in the ad hoc backtesting software created for a paper published by the editor of the journal Quantitative Finance. From long-term factor model research to high-frequency order book analysis, researchers are writing more lines of code of increasing complexity. There is an often-cited statistic that the professional software industry produces 15-50 errors per 1,000 lines of code delivered. Yet to the best of my knowledge, there are no finance journals that have professional software developers review submitted code. Most finance journals still either don’t have or don’t enforce code submission and open publishing standards — an asinine move that both increases the uncertainty of the results as well as slows the progress of future research.
Domain knowledge is required
The other big problem, with a worrisome trend (especially among lower quality journals, and lower quality academics), is how much of the research and publishing apparatus is completely blasé about the differences in the domain knowledge required to analyze large cross-sections of quarterly stock data versus high-frequency trade and order book data. Examples:
- This paper casually invented a metric to measure high-frequency “liquidity takers”, but if the author could have asked anyone in industry and realized he was counting market makers racing for queue position to provide liquidity.
- This paper assumed stock volumes are normally distributed, so instead of finding evidence of high-frequency “quote stuffing” as the authors’ claims, instead, they found evidence that stock volumes look more like this.
- This paper didn’t know what times futures markets were open, and also applauded itself for a high predictive accuracy on a contract no one trades.
- This paper calculated closed profit/loss from buying a futures contract with one expiration and selling a futures contract the next day with a different expiration.
- This paper applies a denoising filter to the whole time series before predicting it, meaning that each point has information from the future in it. And the authors also added trading costs to their profit/loss.
- This paper tries to predict 5-minute returns for US stocks but forgot the fact that the market is closed on nights and weekends, and the authors also admitted to trying a whole bunch of models until one looked good in sample.
Each of those papers has numerous other problems. When you take these problems to the authors and publishers, their first response is always “oh no! we should do something!”, but at some point I think they all go: “what has reality done for me lately? It certainly didn’t publish this paper.”
Summary
Despite steadily improving market data availability and a large open source software community, academic research in algorithmic trading and quantitative investing risks committing classes of errors beyond the well-known “data mining” risk. As research relies on increasingly complex code that processes orders of magnitude more data, the contemporary academic researcher must play greater roles as software developers and data engineers. However, the existing academic publishing and peer review system do not appear to be well-equipped to evaluate these new roles. Minor oversights in data preparation and small software bugs lead to invalid results that can go unnoticed for years. Industry best practices and a more rigorous standard of evidence are required to mitigate these risks and ensure the integrity of ongoing research.