Talk:Delta rule

What is $$h_j$$? Is it the value previously sent to the activation function?
 * Yep. It's the weighted sum of all the outputs of the neurons going into j.64.231.108.197 (talk) 03:08, 4 March 2009 (UTC)

The indices on $$\omega_{j i}$$ switch between $$\omega_{i j}$$ and back, probably an oversight. I will alter it to what I think it should be, correct me if I'm wrong.

Disputed
This article suggests training perceptrons with a squared error loss function. That's a pretty bad idea, as the gradient of that loss function is no good for classification tasks -- it doesn't approximate the actual zero-one loss very well and is therefore quite slow to converge. A variant of the hinge loss, $$\ell(y) = \max(0, - t \cdot y)$$, works much better in practice.

Unfortunately, I don't have a source for this -- but the fact that squared error loss is inappropriate should be covered in any good textbook that covers linear models. Qwertyus (talk) 13:20, 1 December 2012 (UTC)
 * As far as I can see, there are two understandings of the "perceptron" term. The first is the original Rosenblatt's perceptron with the step function as an activation function. The second is just the synonym of an artifitial neuron or even a network. The first seems to be prevalent now (?), but the second continues to thrive: this article uses it, as well as for example Multilayer perceptron. The article Perceptron follows the first definition, except for the notion of the Delta rule there which makes no sense in it. If I'm right, this mix-up must be resolved explicitly in all affected articles. Sadly I'm not sure I'm right :). Igogo3000 (talk) —Preceding undated comment added 02:43, 9 January 2014 (UTC)
 * Well, the Perceptron article already contains the indication that "multilayer perceptron" is a misnomer. But still there is no agreement whether we call for example linear neuron a perceptron. Igogo3000 (talk) 03:28, 9 January 2014 (UTC)


 * I've never seen linear regression models being called perceptrons. I'm not too familiar with the old NN terminology, but I think even a step neuron trained fr an MSE loss function is not a perceptron, but an ADALINE. (Note that the application of the step function is only used for classification, and classification must *always* use such a function, even in logistic regression, SVMs and multilayer nets.) Q VVERTYVS (hm?) 10:42, 9 January 2014 (UTC)


 * These slides from an MIT professor confirm that this is used to train perceptrons. Removed the disputed tag. Q VVERTYVS (hm?) 21:23, 18 February 2014 (UTC)

@Qwertyus that function isn't differentiable, is it? Also this is the article for the delta function. Using a different error function would result in a different learning rule, and therefore it wouldn't be the delta rule. 71.50.59.119 (talk) 04:33, 29 January 2015 (UTC)


 * The point about the hinge loss was really a side remark. My main contention is that the article should point out connections between the delta rule and similar concepts in statistics and optimization: squared error, linear regression. A lot of the early NN folks were reinventing (or independently discovering) things that have different names in different fields. Q VVERTYVS (hm?) 12:27, 29 January 2015 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 1 one external link on Delta rule. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20160304032228/http://uhavax.hartford.edu/compsci/neural-networks-delta-rule.html to http://uhavax.hartford.edu/compsci/neural-networks-delta-rule.html

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.— InternetArchiveBot  (Report bug) 15:03, 10 December 2016 (UTC)