The paper in question proposes a way to reduce the amount of computation needed in convolutional networks roughly three times, while keeping the same accuracy. Here’s what you wanted to know about this method (already available in TensorFlow), reprinted from two smart folks.

swibe: In traditional convolution layers, the convolution is tied up with cross-channel pooling: for each output channel, a convolution is applied to each input channel and the results are summed together.

This leads to the unfortunate situation where the network may often be repeatedly applying similar filters to each input channel. Storing these filters wastes memory, and applying them repeatedly wastes computation.

It’s possible to instead split the computation into a convolution stage, where multiple filters are applied to each input channel, and a cross-channel pooling stage where the output channels each use whichever intermediate results are of use to them. It allows the filters to be shared across multiple output channels. This reduces their number substantially, allowing for more efficient networks, or larger networks with a similar computational cost.

benanne: This is not a particularly novel idea, so I get the feeling that some references are missing. It’s been available in TensorFlow as `tf.nn.separable_conv2d()`

for a while, and in this presentation Vincent Vanhoucke [slides, video] also discusses it (slide 26 and onwards).

The results seem to be pretty solid though, and the interaction with residual connections probably makes it more practical. It’s a nice way to further increase depth and nonlinearity while keeping the computational cost and risk of overfitting at reasonable levels.