Talk:Residual neural network

Backward propagation
During backpropagation learning for the normal path

and for the skipper paths (note that they are close to identical)
 * $$ \Delta w^{\ell-2,\ell} := -\eta \frac{\partial E}{\partial w^{\ell-2,\ell}} = -\eta a^{\ell-2} \cdot \delta^\ell$$

In both cases we have
 * $E$ an error function
 * $\eta$ a learning rate ($\eta < 0)$ ,
 * $\delta^\ell$ the error signal of neurons at layer $\ell$, and
 * $a_i^\ell$ the activation of neurons at layer $\ell$

If the skippers have fixed weights, then they will not be updated. If they can be updated, then the rule will be an ordinary backprop update rule.

In the general case there can be $K$ skipper weight matrices, thus
 * $$ \Delta w^{\ell-k,\ell} := -\eta \frac{\partial E}{\partial w^{\ell-k,\ell}} = -\eta a^{\ell-k} \cdot \delta^\ell$$

As the learning rules are similar the weight matrices can be merged and learned in the same step. — Preceding unsigned comment added by Petkond (talk • contribs) 23:58, 19 August 2018 (UTC)

Manifold
I wrote During later learning it will stay closer to the manifold and thus learn faster. but now it is Towards the end of training, when all layers are expanded, it stays closer to the manifold and thus learns faster. I would say the rephrasing is wrong. Initial learning with skipped layers will bring the solution somewhat close to the manifold. When skipping is progressively dropped, with further learning in progress, then the network will stay close to the manifold during this learning. Staying close to the manifold is not something that only happen during final training. Jeblad (talk) 20:27, 6 March 2019 (UTC)

Compressed layers?
I wrote The intuition on why this work is that the neural network collapses into fewer layers in the initial phase, which makes it easier to learn, and then gradually expands as it learns more of the feature space. which is now Skipping effectively compresses the network into fewer layers in the initial training stages, which speeds learning. I believe it is wrong to say this is a compression of layers, as there are no learned network to be compressed at this point. It would be more correct to say that the initial simplified network, is easier to learn due to less vanishing gradients, is gradually expanded into a more complex network. Jeblad (talk) 20:33, 6 March 2019 (UTC)


 * The error is introduced here I'm not going to fix this. Jeblad (talk) 20:56, 6 March 2019 (UTC)

Agree "simplified" makes more sense than "compressed". I think the idea of the network being (effectively) expanded as training progresses is conveyed by the rest of the paragraph, no? AliShug (talk) 01:07, 9 March 2019 (UTC)


 * Still note, this isn't really about a simplified layer, it is about jumping over layers. It is collapsing two or more layers into one until the skipped layers starts to give better results than skipping them. Another way to say it is "the network expands its learning capacity with increased acquired knowledge". That might although give some the impression the network is a little to much of an AI. Jeblad (talk) 20:58, 11 April 2019 (UTC)

DenseNets
I have no idea why DenseNets are linked to Sparse network. DenseNets is a moinker used for a specific way to implement residual neural networks. If the link text had been "dense networks" it could have made sense to link to an opposite. Jeblad (talk) 20:51, 6 March 2019 (UTC)

Biological Analog
The biological analog section seems to say that cortical layer VI neurons receive significant input from layer I; I haven't been able to find any references for this. The notion that 'skip' synapses exist in biology does seem to be supported, but I haven't been able to find any existing sources that explicitly compare residual ANNs with biological systems - if this section is speculation, it should be removed. Any source (even a blog post) would be fine. AliShug (talk) 22:03, 11 March 2019 (UTC)


 * This section is confusing. It seems to be saying that the cortical flow of information goes from Layer I to Layer VI and pyramidal cells provide the skip connections. However, layer IV is typically the main source of cortical input. This is thought to mainly feed "up" to layer I and then connects to the subgranular layers (layer V and VI). Pyramidal neurons are found throughout the layers (esp. III and IV, according to [Pyramidal_cell]). I would say this section and all references to pyramidal neurons should be removed from this article. JonathanWilliford (talk) 01:07, 19 March 2019 (UTC)


 * Pyramidal cells in layer VI has its apical dendrite extended into layer I, it skips layer II to V. If it "feed up" to layer I (and further) you would have a serious functional problem with how to propagate out through the synapses. Information flow is from layer I to layer VI, and from synapses way out on the dendrites to the soma and out the axons. The spike has a reverse component up through the dendrites, but that isn't important for the forward propagation, only for learning. But it is a wiki, so go ahead, edit. Jeblad (talk) 20:47, 11 April 2019 (UTC)