Machine learning made easy

And deliver us from Weka

Sometimes, fortunately not very often, we see people mention Weka as useful machine learning software. This is misleading, because Weka is just a toy: it can give a beginner a bit of taste of machine learning, but if you want to accomplish anything meaningful, there are many way better tools.

Warning: this is a post with strong opinions.

Weka has at least two serious shortcomings:

  1. It’s written in Java. We think that Java has its place*, but not really in desktop machine learning. Particularly, Java is very memory-hungry and it means that you either constrain yourself to really small data or buy some RAM. This also applies to other popular GUI tools like Rapid Miner or KNIME.

    It is no accident that some of the best known Java projects concern themselves with using many machines. One is just not enough with Java. Weka people chose another way: their next project, MOA, is about data stream mining. Online learning, in other words. That way you only need one example in memory at a time.

    Bruce Lee responding to Scaling Weka With Hadoop proposal

  2. Weka uses its own proprietary file format called ARFF. It’s seriously stupid and you will discover that if you try to work with it.

Don’t believe us? Check for yourself.

*Java seems to be successfully applied in distributed infrastructure-type software like Hadoop.