I am still staggering through Tim Urban's two part article on waitbutwhy. It's very good. I'm worried whether I should be worried about military shits getting hold of Deep Learning "for my own protection".
The idea is to train on a massive ensemble of nets, but to use a much smaller network for the actual work of categorisation and generalisation. I was wondering how one would best go about "snipping out a neuron" from a serial chain. In deep learning, there can be up to 20 layers, so such a reduction might be a decent idea.
I suspect this is a crazy idea, because much of the power of NNs arises from the nonlinearity of the transfer function of each neuron. It seems that these days a simple rectifier with bias is used; i.e. for input x, output y, weight w and bias b, the nonlinear transfer function is y = max(0, w*x - b). Concatenating two of these we get y = max(0, w*y' - b), where y' = max(0, w'*x - b'). In order to "snip out" the middle neuron, we need y = f(x).
What you'd want from this is to train on some humungous net that couldn't possibly be associated in a practical fashion with a real robot - and needn't be. Distillation (see above), or some variant thereof, takes the essence of the net's knowledge and stuffs it into the robot's little computer. No need for a cloud.