Talk:Residual neural network

This article is within the scope of WikiProject Cognitive science, a project which is currently considered to be inactive.Cognitive scienceWikipedia:WikiProject Cognitive scienceTemplate:WikiProject Cognitive scienceCognitive science articles

Robotics Mid‑importance

	This article is within the scope of WikiProject Robotics, a collaborative effort to improve the coverage of Robotics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.RoboticsWikipedia:WikiProject RoboticsTemplate:WikiProject RoboticsRobotics articles
Mid	This article has been rated as Mid-importance on the project's importance scale.

Computing Mid‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
Mid	This article has been rated as Mid-importance on the project's importance scale.

Computer science Low‑importance

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science articles

Low

This article has been rated as Low-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Statistics Low‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
Low	This article has been rated as Low-importance on the importance scale.

Backward propagation[edit]

During backpropagation learning for the normal path

\Delta w^{\ell -1,\ell }:=-\eta {\frac {\partial E}{\partial w^{\ell -1,\ell }}}=-\eta a^{\ell -1}\cdot \delta ^{\ell }

and for the skipper paths (note that they are close to identical)

\Delta w^{\ell -2,\ell }:=-\eta {\frac {\partial E}{\partial w^{\ell -2,\ell }}}=-\eta a^{\ell -2}\cdot \delta ^{\ell }

In both cases we have

{\textstyle E}

an error function

{\textstyle \eta }

a learning rate (

{\textstyle \eta <0)}

,

{\textstyle \delta ^{\ell }}

the error signal of neurons at layer

{\textstyle \ell }

, and

{\textstyle a_{i}^{\ell }}

the activation of neurons at layer

{\textstyle \ell }

If the skippers have fixed weights, then they will not be updated. If they can be updated, then the rule will be an ordinary backprop update rule.

In the general case there can be ${\textstyle K}$ skipper weight matrices, thus

\Delta w^{\ell -k,\ell }:=-\eta {\frac {\partial E}{\partial w^{\ell -k,\ell }}}=-\eta a^{\ell -k}\cdot \delta ^{\ell }

As the learning rules are similar the weight matrices can be merged and learned in the same step.— Preceding unsigned comment added by Petkond (talk • contribs) 23:58, 19 August 2018 (UTC)[reply]

Manifold[edit]

I wrote During later learning it will stay closer to the manifold and thus learn faster. but now it is Towards the end of training, when all layers are expanded, it stays closer to the manifold and thus learns faster. I would say the rephrasing is wrong. Initial learning with skipped layers will bring the solution somewhat close to the manifold. When skipping is progressively dropped, with further learning in progress, then the network will stay close to the manifold during this learning. Staying close to the manifold is not something that only happen during final training. Jeblad (talk) 20:27, 6 March 2019 (UTC)[reply]

Compressed layers?[edit]

I wrote The intuition on why this work is that the neural network collapses into fewer layers in the initial phase, which makes it easier to learn, and then gradually expands as it learns more of the feature space. which is now Skipping effectively compresses the network into fewer layers in the initial training stages, which speeds learning. I believe it is wrong to say this is a compression of layers, as there are no learned network to be compressed at this point. It would be more correct to say that the initial simplified network, is easier to learn due to less vanishing gradients, is gradually expanded into a more complex network. Jeblad (talk) 20:33, 6 March 2019 (UTC)[reply]

The error is introduced here [1] I'm not going to fix this. Jeblad (talk) 20:56, 6 March 2019 (UTC)[reply]

Agree "simplified" makes more sense than "compressed". I think the idea of the network being (effectively) expanded as training progresses is conveyed by the rest of the paragraph, no? AliShug (talk) 01:07, 9 March 2019 (UTC)[reply]

Still note, this isn't really about a simplified layer, it is about jumping over layers. It is collapsing two or more layers into one until the skipped layers starts to give better results than skipping them. Another way to say it is "the network expands its learning capacity with increased acquired knowledge". That might although give some the impression the network is a little to much of an AI. Jeblad (talk) 20:58, 11 April 2019 (UTC)[reply]

DenseNets[edit]

I have no idea why DenseNets are linked to Sparse network. DenseNets is a moinker used for a specific way to implement residual neural networks. If the link text had been "dense networks" it could have made sense to link to an opposite. Jeblad (talk) 20:51, 6 March 2019 (UTC)[reply]

Biological Analog[edit]

The biological analog section seems to say that cortical layer VI neurons receive significant input from layer I; I haven't been able to find any references for this. The notion that 'skip' synapses exist in biology does seem to be supported, but I haven't been able to find any existing sources that explicitly compare residual ANNs with biological systems - if this section is speculation, it should be removed. Any source (even a blog post) would be fine. AliShug (talk) 22:03, 11 March 2019 (UTC)[reply]

This section is confusing. It seems to be saying that the cortical flow of information goes from Layer I to Layer VI and pyramidal cells provide the skip connections. However, layer IV is typically the main source of cortical input. This is thought to mainly feed "up" to layer I and then connects to the subgranular layers (layer V and VI). Pyramidal neurons are found throughout the layers (esp. III and IV, according to [Pyramidal_cell]). I would say this section and all references to pyramidal neurons should be removed from this article. JonathanWilliford (talk) 01:07, 19 March 2019 (UTC)[reply]

Pyramidal cells in layer VI has its apical dendrite extended into layer I, it skips layer II to V. If it "feed up" to layer I (and further) you would have a serious functional problem with how to propagate out through the synapses. Information flow is from layer I to layer VI, and from synapses way out on the dendrites to the soma and out the axons. The spike has a reverse component up through the dendrites, but that isn't important for the forward propagation, only for learning. But it is a wiki, so go ahead, edit. Jeblad (talk) 20:47, 11 April 2019 (UTC)[reply]