Machine learning made easy

Michael Jordan on deep learning

On September 10th Michael Jordan, a renowned statistician from Berkeley, did Ask Me Anything on Reddit. These are his thoughts on deep learning.

My first and main reaction is that I’m totally happy that any area of machine learning (aka, statistical inference and decision-making; see my other post :-) is beginning to make impact on real-world problems. I’m in particular happy that the work of my long-time friend Yann LeCun is being recognized, promoted and built upon. Convolutional neural networks are just a plain good idea.

I’m also overall happy with the rebranding associated with the usage of the term “deep learning” instead of “neural networks”. In other engineering areas, the idea of using pipelines, flow diagrams and layered architectures to build complex systems is quite well entrenched, and our field should be working (inter alia) on principles for building such systems. The word “deep” just means that to me—layering (and I hope that the language eventually evolves toward such drier words…). I hope and expect to see more people developing architectures that use other kinds of modules and pipelines, not restricting themselves to layers of “neurons”.

With all due respect to neuroscience, one of the major scientific areas for the next several hundred years, I don’t think that we’re at the point where we understand very much at all about how thought arises in networks of neurons, and I still don’t see neuroscience as a major generator for ideas on how to build inference and decision-making systems in detail. Notions like “parallel is good” and “layering is good” could well (and have) been developed entirely independently of thinking about brains.

I might add that I was a PhD student in the early days of neural networks, before backpropagation had been (re)-invented, where the focus was on the Hebb rule and other “neurally plausible” algorithms. Anything that the brain couldn’t do was to be avoided; we needed to be pure in order to find our way to new styles of thinking. And then Dave Rumelhart started exploring backpropagation—clearly leaving behind the neurally-plausible constraint—and suddenly the systems became much more powerful. This made an impact on me. Let’s not impose artificial constraints based on cartoon models of topics in science that we don’t yet understand.

My understanding is that many if not most of the “deep learning success stories” involve supervised learning (i.e., backpropagation) and massive amounts of data. Layered architectures involving lots of linearity, some smooth nonlinearities, and stochastic gradient descent seem to be able to memorize huge numbers of patterns while interpolating smoothly (not oscillating) “between” the patterns; moreover, there seems to be an ability to discard irrelevant details, particularly if aided by weight- sharing in domains like vision where it’s appropriate. There’s also some of the advantages of ensembling. Overall an appealing mix. But this mix doesn’t feel singularly “neural” (particularly the need for large amounts of labeled data). (…)

Lastly, and on a less philosophical level, while I do think of neural networks as one important tool in the toolbox, I find myself surprisingly rarely going to that tool when I’m consulting out in industry. I find that industry people are often looking to solve a range of other problems, often not involving “pattern recognition” problems of the kind I associate with neural networks. E.g.,

  1. How can I build and serve models within a certain time budget so that I get answers with a desired level of accuracy, no matter how much data I have?
  2. How can I get meaningful error bars or other measures of performance on all of the queries to my database?
  3. How do I merge statistical thinking with database thinking (e.g., joins) so that I can clean data effectively and merge heterogeneous data sources?
  4. How do I visualize data, and in general how do I reduce my data and present my inferences so that humans can understand what’s going on?
  5. How can I do diagnostics so that I don’t roll out a system that’s flawed or figure out that an existing system is now broken?
  6. How do I deal with non-stationarity?
  7. How do I do some targeted experiments, merged with my huge existing datasets, so that I can assert that some variables have a causal effect?

Although I could possibly investigate such issues in the context of deep learning ideas, I generally find it a whole lot more transparent to investigate them in the context of simpler building blocks.

Based on seeing the kinds of questions I’ve discussed above arising again and again over the years I’ve concluded that statistics/ML needs a deeper engagement with people in CS systems and databases, not just with AI people, which has been the main kind of engagement going on in previous decades (and still remains the focus of “deep learning”). I’ve personally been doing exactly that at Berkeley, in the context of the RAD Lab from 2006 to 2011 and in the current context of the AMP Lab.

More at