Intro to random forests

2013-01-12

Let’s step back from forays into cutting edge topics and look at a random forest, one of the most popular machine learning techniques today. Why is it so attractive?

Machine learning courses online

2013-01-07

How do you learn machine learning? A good way to begin is to take an online course. These courses started appearing towards the end of 2011, first from Stanford University, now from Coursera, Udacity, edX and other institutions. There are very many of them, including a few about machine learning.

Madelon: Spearmint’s revenge

2013-01-04

Little Spearmint couldn’t sleep that night. I was so close… - he was thinking. It seemed that he had found a better than default value for one of the random forest hyperparams, but it turned out to be false. He made a decision as he fell asleep: Next time, I will show them!

Spearmint with a random forest

2012-12-27

Now that we have Spearmint basics nailed, we’ll try tuning a random forest, and specifically two hyperparams: a number of trees (ntrees) and a number of candidate features at each split (mtry). Here’s some code.

We’re going to use a red wine quality dataset. It has about 1600 examples and our goal will be to predict a rating for a wine given all the other properties.

Tuning hyperparams automatically with Spearmint

2012-12-21

The promise

What’s attractive in machine learning? That a machine is learning, instead of a human. But an operator still has a lot of work to do. First, he has to learn how to teach a machine, in general. Then, when it comes to a concrete task, there are two main areas where a human needs to do the work (and remember, laziness is a virtue, at least for a programmer, so we’d like to minimize amount of work done by a human):

data preparation
model tuning

This story is about model tuning.

Predicting wine quality

2012-12-07

This post is as much about wine as it is about machine learning, so if you enjoy wine, like we do, you may find it especially interesting. Here’s some R and Matlab code, and if you want to get right to the point, skip to the charts.

There’s a book by Philipp Janert called Data Analysis with Open Source Tools, which, by the way, we would recommend. From this book we found out about the wine quality datasets. There are two, one for red wine and one for white wine, and they are interesting because they contain quality ratings (1 - 10) for a few thousands of wines, along with their physical and chemical properties. We could probably use these properties to predict a rating for a wine.

The Facebook challenge HOWTO

2012-11-17

Last time we wrote about the Facebook challenge. Now it’s time for some more details. The main concept is this: in its original state, the data is useless. That’s because there are many names referring to the same entity. Precisely, there are about 350k unique names, and the total number of entities is maybe 20k. So cleaning the data is the first and most important step.

So you want to work for Facebook

2012-10-25

Good news, everyone! There’s a new contest on Kaggle - Facebook is looking for talent. They won’t pay, but just might interview.

This post is in a way a bonus for active readers because most visitors of fastml.com originally come from Kaggle forums. For this competition the forums are disabled to encourage own work. To honor this, we won’t publish any code. But own work doesn’t mean original work, and we wouldn’t want to reinvent the wheel, would we?

Merck challenge

2012-10-15

Today it’s about Merck challenge - let’s beat the benchmark real quick. Not by much, but quick.

Predicting closed questions on Stack Overflow

2012-10-05

This time we enter the Stack Overflow challenge, which is about predicting a status of a given question on SO. There are five possible statuses, so it’s a multi-class classification problem.

We would prefer a tool able to perform multiclass classification by itself. It can be done by hand by constructing five datasets, each with binary labels (one class against all others), and then combining predictions, but it might be a bit tricky to get right - we tried. Fortunately, nice people at Yahoo, excuse us, Microsoft, recently relased a new version of Vowpal Wabbit, and this new version supports multiclass classification.

← Older Contents Newer →

FastML

Machine learning made easy