FastML

Machine learning made easy

Are stocks predictable?

We’d like to be able to predict stock market. That seems like a nice way of making money. We’ll address the fundamental issue: can stocks be predicted in the short term, that is a few days ahead?

There’s a technique that seeks to answer the question of predictability. It’s called Forecastable Component Analysis. Based on a new forecastability measure, ForeCA finds an optimal transformation to separate a multivariate time series into a forecastable and an orthogonal white noise space. The author, Georg M. Goerg*, implemented it in R package ForeCA.

It might be useful in two ways:

  1. It can tell you how forecastable time series is.
  2. Given a multivariate time series, let’s say a portfolio of stocks, it can find forecastable components.

The idea in the second point is similiar to PCA - ForeCA is a linear dimensionality reduction technique. The main difference is that the method explicitly addresses forecastability. It does so by considering an interplay between time and frequency:

Forecasting is inherently tied to the time domain. Yet, since [equations in the paper] provide a one-to-one mapping between the time and frequency domain, we can use frequency domain properties to measure forecastability.

That’s the same stuff as in the Fourier transform.

*An interesting name, isn’t it? Kind of symmetric.

Stationary time series only

It only makes sense to apply ForeCA to at least weekly stationary time series. Raw stock data doesn’t fit the bill, but daily returns do. A daily return shows how price changed relative to the one day before. It’s an absolute change divided by the previous price.

By the way, there is an online course, Monte Carlo Methods in Finance, that explains some useful concepts relevant to the subject matter. For example, you can learn about returns in section 1.4.


Daily stock data: raw (above) and returns (below). The returns usually have zero mean.

See section 4.2 for an intuition about the connection between time and frequency spectrum.

Running ForeCA

You can reproduce the equity funds analysis from the paper by running just a few lines of code from the author’s page:

library(ForeCA)
library(fEcofin)
ret <- ts(equityFunds[, -1]) 
mod <- foreca(ret)
summary(mod)
plot(mod)

Of course you’ll need to install the packages first. Installing ForeCA is more or less straightforward. fEcofin, which contains the equity funds data, is no longer available at CRAN, so install it from R-Forge:

install.packages("fEcofin", repos = "http://r-forge.r-project.org/")

If you’d like to assess forecastability of a particular univariate series, use the Omega function:

XX <- ts(diff(log(EuStockMarkets))[-c(1:1000),])
Omega(XX)

     DAX      SMI      CAC     FTSE 
6.270802 5.679154 5.469274 5.962040 

plot(log(lynx, 10))
Omega(log(lynx, 10), spectrum_method = "direct")

The function will return a percentage: a real-value between 0 and 100. 0 means not forecastable (white noise); 100 means perfectly forecastable (a sinusoid).

Are stocks predictable then? Not very much, it seems. Consult figure 1 from the paper. Omega for S&P 500 daily returns is 1.25%. In the equity funds example, the most forecastable component scores roughly 2.5%. The European indexes above look more forecastable with Omega around 6%. Compare that to monthly mean temperatures in Nottingham scoring 34% (in the paper; the plot below shows 46. The author says he probably used different spectrum.method estimation - different spectrum estimators give different Omega estimates).


Image credit: ForeCA page

Maybe there’s hope, though. In machine learning a lot depends on the input data. When classifying images, for example, you won’t have much success with raw pixels. But extract some features - either by hand-crafting or learning them - and recognizing objects becomes quite easy.

However we know that AI tasks can be done, because humans excel at them after a few years of unsupervised training. When dealing with finance it’s not obvious what’s the benchmark or the limit and there’s not much openness in discussing know-how. Understandably.

We’d like to thank Georg M. Goerg for assistance with ForeCA.

Talk about the article with a chatbot (Llama 3.3 70B from together.ai). The chatbot is rate-limited to six requests per minute. The messages are stored on your device.
data-analysis, software

« Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition PyBrain - a simple neural networks library in Python »

Comments