Interactive in-browser 3D visualization of datasets

2014-12-22

In this post we’ll be looking at 3D visualization of various datasets using the data-projector software from Datacratic. The original demo didn’t impress us initially as much as it could, because the data there is synthetic - it shows a bunch of small spheres in rainbow colors. Real datasets look better.

How to run external programs from Python and capture their output

2014-11-24

Python, being a general purpose programming language, lets you run external programs from your script and capture their output. This is useful for many machine learning tasks where one would like to use a command line application in a Python-driven pipeline. As an example, we investigate how Vowpal Wabbit’s hash table size affects validation scores.

Geoff Hinton’s Dark Knowledge

2014-10-31

Geoff Hinton had been silent since he went to work for Google. Recently, however, Geoff has come out and started talking about something he calls dark knowledge. Maybe some questions shouldn’t be asked, but what does he mean by that?

Vowpal Wabbit, Liblinear/SBM and StreamSVM compared

2014-10-15

Our Indiegogo campaign turned out to be a (partial) success, so we deliver as promised: a comparison of Vowpal Wabbit, Liblinear/SBM and StreamSVM on the webspam dataset. Refer to the Comparing large-scale linear learners for motivation and references.

ICLR 2014 tidbits

2014-09-29

We took a look at a few videos from the 2014 International Conference on Learning Representations and here are some things we consider interesting: predicting class labels not seen in training, benchmarking stochastic optimization algorithms and symmetry-based learning.

Comparing large-scale linear learners

2014-09-15

Recently we’ve been browsing papers about out-of-core linear learning on a single machine. While for us this task is basically synonymous with Vowpal Wabbit, it turns out that there are other options.

Michael Jordan on deep learning

2014-09-14

On September 10th Michael Jordan, a renowned statistician from Berkeley, did Ask Me Anything on Reddit. These are his thoughts on deep learning.

Kaggle vs industry, as seen through lens of the Avito competition

2014-09-04

The Avito competition was about predicting illicit content in classified ads. It amounted to classifying text in Russian. We offer a review of what worked for top ranked participants and some opinions about how Kaggle competitions differ from the industry reality.

Math for machine learning

2014-08-18

Sometimes people ask what math they need for machine learning. The answer depends on what you want to do, but in short our opinion is that it is good to have some familiarity with linear algebra and multivariate differentiation.

Classifier calibration with Platt’s scaling and isotonic regression

2014-08-01

Calibration is applicable in case a classifier outputs probabilities. Apparently some classifiers have their typical quirks - for example, they say boosted trees and SVM tend to predict probabilities conservatively, meaning closer to mid-range than to extremes. If your metric cares about exact probabilities, like logarithmic loss does, you can calibrate the classifier, that is post-process the predictions to get better estimates.

← Older Contents Newer →

FastML

Machine learning made easy