Machine learning made easy

One weird regularity of the stock market

Everybody had the fantasy of predicting the stock market. We investigated the subject in Are stocks predictable?. In short, they are not, at least the prices. The next step would be to go from prices to volatility measures. The reason is that one can use the volatility to properly price stock options using the Black-Scholes model. Wikipedia says that the formula has only one parameter that cannot be directly observed in the market: the average future volatility of the underlying asset. Therefore, the question is, can one predict that volatility?

Classifying time series using feature extraction

When you want to classify a time series, there are two options. One is to use a time series specific method. An example would be LSTM, or a recurrent neural network in general. The other one is to extract features from the series and use them with normal supervised learning. In this article, we look at how to automatically extract relevant features with a Python package called tsfresh.

How to use the Python debugger

This article is not about machine learning, but about a piece of software engineering that often comes handy in data science practice. When writing code, everybody gets errors. Sometimes it is difficult to debug them. Using a debugger may help, but can also be intimidating. This is a TLDR tutorial on using pdb in IPython, focused on looking at variables inside functions.

Two faces of overfitting

Overfitting is on of the primary problems, if not THE primary problem in machine learning. There are many aspects to it, but in a general sense, overfitting means that estimates of performance on unseen test examples are overly optimistic. That is, a model generalizes worse then expected.

We explain two common cases of overfitting: including information from a test set in training, and the more insidious form: overusing a validation set.

Revisiting Numerai

In this article, we revisit Numerai and their weekly data science tournament. New developments include a much larger dataset, tougher requirements for models, and bigger payouts.

It’s embarassing, really

In August, we published the first version of goodbooks-10k, a new dataset for book recommendations. By pure chance, that coincided with a proclamation of Kaggle Datasets Awards. Oh, how we hoped to get one!

Introduction to pointer networks

Pointer networks are a variation of the sequence-to-sequence model with attention. Instead of translating one sequence into another, they yield a succession of pointers to the elements of the input series. The most basic use of this is ordering the elements of a variable-length sequence or set.