Go with big data and machine learning, or leave finance to those who do

Mailiao Refinery, Taiwan Courtesy Planet Labs and Quandl

Big data: big buzz

“Big data” encompasses the collection, processing, indexing and analysis of large-scale datasets. Some concrete examples in the scientific computing arena include temperature and sunlight data from satellites monitoring of the Earth’s environment, particle tracking data produced by the Large Hadron Collider in Europe, satellite image data used in the search for planets orbiting distant stars, and studies of the cosmic microwave background radiation.

These same big data collection and analysis techniques are now being applied to arenas that are decidedly more down-to-earth. These include anonymized smartphone position data, agricultural land images, shipping facility activity, credit card activity, and more. With this type of data in hand, large financial organizations in many cases are able to spot trends even before the industries directly involved are aware of them. Indeed, we may well be seeing the day when quarterly reports or even press reports are relegated to the realm of “old news,” with their content already known (and acted upon) hours, days or even weeks earlier by those equipped with big data tracking tools.

Big data in action

There are numerous examples of big data applications in finance. For example, in 2015 certain hedge funds utilizing satellite data sources noted rising traffic in the parking lots of J.C. Penny stores, and were able to beat other investors to the punch. Indeed, JCP’s stock jumped more than 10% when public reports of JCP’s increased store traffic came to light in August. As another example, in 2015 some investment firms were able to conclude that U.S. corn production was 2.8% smaller than prevailing government estimates, based on analysis of infrared satellite images taken of over one million corn fields.

Other types of big data include shopping mall traffic, parking lot auto counts, coal shipments, oil storage tanks, industrial plant production, flood data, ship location data, mobile payments, geo-tagged smartphone traffic, “web scraping” (e.g., gleaning prices and inventory figures from public e-tailer sites) and machine-readable news.

Some of the principal providers of such data are:

  • Planet Labs has deployed over 100 “cube sats,” shoebox-sized (10cm x 10cm x 30cm) satellites that continuously scan the Earth and send data whenever one passes over a ground station. Planet provides their clients with 3-5 meter resolution, updated frequently.
  • DigitalGlobe offers both satellite images and machine-learning software to enable customers to glean insights from its library.
  • Planet IQ focuses on weather and climate modeling.
  • Orbital Insight focuses on analytic software for satellite and other remote sensing data, even including synthetic aperture radar (SAR) data. They claimed in 2016 that since 2013, their US Retail Traffic Index predicted a beat or miss of Bloomberg consensus estimates 78% of the time.
  • Descartes Labs, which was spun off from the Los Alamos National Laboratory, has reported successes in predicting changes in domestic corn production, based on changes in plant color over time. They have plans to extend their reach to drones and mobile phones.
  • RS Metrics features data in retail traffic, real estate, metals production and others.
  • Spire focuses on ships, planes and weather.

Machine learning to the rescue

It is important to recognize that big data, if it is to be truly useful in the finance world, requires significantly more effort that merely downloading some large datasets from some public (or private) repository. In particular, machine learning methods are usually require to make sense of large datasets.

For example, to accurately predict crop yields using NASA Landsat photo imagery, photos must be monitored for several years, keeping close tabs on many individual patches of lands. Then this imagery data must be correlated with data on the type of crop (e.g., corn or soybeans), date of germination, and typical yield. Noting the difference in appearance between when a field has produced a high yield compared with a lower yield, machine-learning techniques must then be employed to more accurately predict current crop yields.

For some additional details on current applications of big data in the finance world, see this earlier Mathematical Investor blog.

Big data, machine learning and the future of investing

A recent MarketWatch article describes in detail the explosive growth in big data and machine learning in finance, and how this is likely to play out in the future. Here are some findings:

  1. The usage of big data and machine learning is spreading beyond the realm of a handful of quantitative hedge funds to other asset managers.
  2. Total spending on “alternative data” by mutual funds, hedge funds pension funds and others who buy securities (for their own or clients’ accounts) is projected to jump from $232 million in 2016 to $1.1 billion in 2019, and to $1.7 billion in 2020, according to AlternativeData.org, an industry trade group.
  3. A broader measure of spending, including outlays for data sources, data science, IT infrastructure, data management and systems development, as measured by Opimas, is expected to rise to more than $7 billion by 2020.
  4. A Greenwich Associates study, reported in the MarketWatch article, said that 72% of financial firms that have tried “alternative data” analysis have enhanced their profitability in so doing, and over 20% of those who achieved gains report that they obtain an average of 20% of their “alpha” from alternative data operations.

It is important to recognize that effective usage of big data requires advanced machine-learning based technology — conventional, relatively unsophisticated statistical approaches or chart analyses will not work. Further, as Marcos Lopez de Prado has pointed out, alternative datasets are beyond the grasp of econometrics and other traditional quantitative methods.


We pointed out in a previous Math Investor blog that the majority of the hedge funds that have consistently beaten the market averages in recent years have employed highly mathematical, data-intensive strategies. This was also underscored in this Bloomberg article, which reported that several traditional hedge funds have closed, yet quantitative funds such as Renaissance Technologies and Two Sigma are still attracting new funds and clients.

The MarketWatch article mentioned above quotes Octavio Marenzi, CEO of Opimas (a capital-markets management consulting firm), who concludes that traditional investment managers have three options:

Traditional investment managers face three options, said Octavio Marenzi, chief executive officer of Opimas, a capital-markets-focused management consulting firm. The first option is to embrace alternative data and effectively adopt a more quantitative approach. The second is to go into passive investing, tracking an index and abandoning research altogether. “And the third option is to go home and give up.”

So this is the stark choice that many organizations now face: Go with big data and machine learning, combined with other advanced quantitative technology, or leave finance to those who do.

Comments are closed.