Talk:Batch normalization

Copyright problem removed
This article has been revised as part of a large-scale clean-up project of multiple article copyright infringement. (See the investigation subpage.)&#32;Prior content in this article duplicated one or more previously published sources. The material was copied from: https://medium.com/deeper-learning/glossary-of-deep-learning-batch-normalisation-8266dcd2fa82. Copied or closely paraphrased material has been rewritten or removed and must not be restored, unless it is duly released under a compatible license. (For more information, please see "using copyrighted works from others" if you are not the copyright holder of this material, or "donating copyrighted materials" if you are.)

For legal reasons, we cannot accept copyrighted text or images borrowed from other web sites or published material; such additions will be deleted. Contributors may use copyrighted publications as a source of information, and, if allowed under fair use, may copy sentences and phrases, provided they are included in quotation marks and referenced properly. The material may also be rewritten, providing it does not infringe on the copyright of the original or plagiarize from that source. Therefore, such paraphrased portions must provide their source. Please see our guideline on non-free text for how to properly implement limited quotations of copyrighted text. Wikipedia takes copyright violations very seriously, and persistent violators will be blocked from editing. While we appreciate contributions, we must require all contributors to understand and comply with these policies. Thank you. Justlettersandnumbers (talk) 10:17, 16 August 2020 (UTC)

Inference with Batch-Normalized Networks
In the formula, $$y^{(k)} = BN^{\text{inf}}_{\gamma^{(k)},\beta^{(k)}}(x^{(k)})=\frac{\gamma}{\sqrt{\operatorname{Var}[x^{(k)}]+\epsilon}}x^{(k)}+\Bigg(\beta-\frac{\gamma E[x^{(k)}]}{\sqrt{\operatorname{Var}[x^{(k)}]+\epsilon}}\Bigg)$$

the indices are missing.

Shouldn't it be

$$y^{(k)} = BN^{\text{inf}}_{\gamma^{(k)},\beta^{(k)}}(x^{(k)})=\frac{\gamma^{(k)}}{\sqrt{\operatorname{Var}[x^{(k)}]+\epsilon}}x^{(k)}+\Bigg(\beta^{(k)}-\frac{\gamma^{(k)} E[x^{(k)}]}{\sqrt{\operatorname{Var}[x^{(k)}]+\epsilon}}\Bigg)$$

doubt about the backprop derivatives
In the dl/dmu - the 2nd term has a sum of x-mu - this sum is equal to 0, since the sum of the x's is equal to the sum of the mean. Why does this term appear then? 83.48.83.202 (talk) 13:58, 14 March 2023 (UTC)