^one weird trick for training char-^r^n^ns

2016-05-31

Character-level recurrent neural networks are attractive for modelling text specifically because of their low input and output dimensionality. You have only so many chars to represent - lowercase letters, uppercase letters, digits and various auxillary characters, so you end up with 50-100 dimensions (each char is represented in one-hot encoding).

Still, it’s a drag to model upper and lower case separately. It adds to dimensionality, and perhaps more importantly, a network gets no clue that ‘a’ and ‘A’ actually represent pretty much the same thing.

Adversarial validation, part one

2016-05-23

Many data science competitions suffer from a test set being markedly different from a training set (a violation of the identically distributed assumption). It is then difficult to make a representative validation set. We propose a method for selecting training examples most similar to test examples and using them as a validation set. The core of this idea is training a probabilistic classifier to distinguish train/test examples.

In part one, we inspect the ideal case: training and testing examples coming from the same distribution, so that the validation error should give good estimation of the test error and classifier should generalize well to unseen test examples.

Coming out

2016-04-01

People often ask how we’ve been able to learn about and cover so many different and diverse topics in machine learning (using at least three different programming languages - Python, Matlab, and R) and generally achieve such prominence in the community, all this in a relatively short time. Today we finally give a definitive answer.

Bayesian machine learning

2016-03-28

So you know the Bayes rule. How does it relate to machine learning? It can be quite difficult to grasp how the puzzle pieces fit together - we know it took us a while. This article is an introduction we wish we had back then.

What next?

2016-02-18

We have a few ideas about what to write about next and are looking for your feedback. Vote in the poll at the bottom of this post.

What is better: gradient-boosted trees, or a random forest?

2016-01-27

Folks know that gradient-boosted trees generally perform better than a random forest, although there is a price for that: GBT have a few hyperparams to tune, while random forest is practically tuning-free. Let’s look at what the literature says about how these two methods compare.

Numerai - like Kaggle, but with a clean dataset, top ten in the money, and recurring payouts

2015-12-21

Numerai is an attempt at a hedge fund crowd-sourcing stock market predictions. It presents a Kaggle-like competition, but with a few welcome twists.

What you wanted to know about TensorFlow

2015-11-30

TensorFlow is a new deep learning library from Google. Immediately after release it became the most starred deep learning package on GitHub. From the hype one could conclude that TensorFlow is the best thing since sliced bread. Is it?

Predicting sales: Pandas vs SQL

2015-10-19

Pandas is Python software for data manipulation. We show that some rather simple analytics allow us to attain a reasonable score in an interesting Kaggle competition. While doing that, we look at analogies between Pandas and SQL, a standard in relational databases.

An excerpt from The Master Algorithm

2015-09-22

Pedro Domingos’ new book, The Master Algorithm, is a readable overview of machine learning. The author discerns and describes five main schools of thought in the field: symbolists, connectionists, evolutionaries, Bayesians and analogizers. Here’a a piece about how Bayesians fit their models, that is, infer parameter values. Even though the context is Bayes nets, the described method is applicable to almost any model.

← Older Contents Newer →

FastML

Machine learning made easy