To celbrate the first 100 followers on Twitter, we asked them what would they like to read about here. One of the responders, Itamar Berger, suggested a topic: how to choose a ML algorithm for a task at hand. Well, what do we now?
Three things come to mind:
We’d try fast things first. In terms of speed, here’s how we imagine the order:
- linear models
- trees, that is bagged or boosted trees
- everything else*
We’d use something we are comfortable with. Learning new things is very exciting, however we’d ask ourselves a question: do we want to learn a new technique, or do we want a result?
We’d prefer something with fewer hyperparameters to set. More params means more tuning, that is training and re-training over and over, even if automatically. Random forests are hard to beat in this department. Linear models are pretty good too.
A random forest scene, credit: Jonathan MacGregor
*By “everything else” we mean “everything popular”, mostly things like SVMs or neural networks. There’s been some interesting research about fast non-linear methods and we hope to write about it when we get to grips with the stuff.
We’d also like to mention some other simple algorithms besides linear models, for example nearest neighbours or naive Bayes. Sometimes they may yield good or at least acceptable results.
That was fast, wasn’t it?