Talk:Extreme learning machine

Controversy
There's actually quite a bit of controversy surrounding the "extreme learning machine", but I'm having trouble finding properly published sources about it. There's a Facebook post by Yann LeCun which points out that this is really a relabeling of the original (1958) perceptron algorithm/hardware, but even though LeCun is an expert on neural nets, I'm hesitant to cite a Facebook post. IIRC, G-B Huang et al. at some point responded to similar criticism. I'll try to find the paper. Q VVERTYVS (hm?) 11:08, 30 July 2015 (UTC)


 * Found it, started a section about the controversy. Q VVERTYVS (hm?) 11:49, 30 July 2015 (UTC)

Bogus math
The original paper contains clear bogus-math. Theorem 2.1 is obviously wrong and the proof is nonsense. g = 0 fulfills the requirements but the theorem is clearly not true for this g. Also, the statement "let us suppose that c belongs to a subspace of dimension N-1" doesn't make sense. c is a vector. It belongs to infinitely many subspaces of dimension N-1 (unless N=1). I haven't found a source on this. I'm consider blogging it so wikipedia can cite the blog. Andreas Mueller (talk) 20:49, 2 February 2017 (UTC)

Hidden layer
Something that is not mentioned in this wiki - and which I think is of interest to understand why this approach works as well as it does - is that they use single-forward neural networks with a huge amount of hidden nodes. There is then further work to prune it or incrementally finding the most appropriate number of nodes [see the 2011 survey by Haung, Wang and Lan].

If my understanding of this is accurate, I think this would be of interest to readers, to get a grasp of why they work. --Ric8cruz (talk) 09:46, 30 March 2016 (UTC)

After some testing and reviewing the literature, and to my surprise, it does not seem that a *huge* amount of hidden nodes is necessary. It works surprisingly well with few nodes. Anyhow, it would be interesting to tell wiki readers:

1. the idea behind the method is mapping features into another space and then do a linear regression from there; the more hidden nodes the better if regularization is used

2. the random number sampling can be done from any continuous uniform distribution (see Haung, 2006 or the 2011 review), not just from a Gaussian as it says in this wiki page

3. there are other approaches to using random numbers that involve using kernel functions Ric8cruz (talk) 14:09, 30 March 2016 (UTC)

Node Initialization
Adding to the comment on why extreme learning machine (ELM) works. I think it is necessary to: (1) inform the readers about why this algorithm works despite using a system of linear equation. (2) in what situation will its performance deteriorates or not work efficiently.

Answers: (1) The main reason why ELM works very well is because it randomly initializes the input weights and bias during training and testing. Therefore, the algorithm simply sees both training and testing data in the same light. In contrast, although in the conventional feed forward back propagation neural networks (FFBP), the weights and bias are initialized randomly during training and testing but these weights are updated (tuned) during training. In other words, the model is tuned to the training set, so if the test has a large enough deviation to the training set, the FFBP would perform poorly compared to ELM which sees both data in the same distribution context.

(2) ELM would normally perform excellently on a small dataset, because the weight and bias can be initialized to cover a fairly reasonable distribution for both training and testing. However, in a really large dataset, ELM will deteriorate in performance because we are now dealing with an extremely large space which includes noise and outliers. Therefore, there is no guarantee that the weights and bias can be initialized to cover such diverse distribution effectively.

To conclude, we can say ELM indirectly adds a unified inference about the distribution of the training and testing data, as long as the data size is not too large.

GenMachLearn —Preceding undated comment added 15:04, 25 June 2016 (UTC)

What type of problems have they been found suitable for
Have they been compared to other approaches on any types of problem ? It would be nice if article could mention some. - Rod57 (talk) 23:01, 16 April 2017 (UTC)

Associative Memory Interpretation.
If you have a locality sensitive hash (LSH) with +1,-1 binary output bits you can weight each bit and sum to get a recalled value. To train you recall and calculate the error. Divide by the number of bits and then add or subtract that number from each weight as appropriate to make the error zero. The capacity is just short of the number of bits. Adding a new memory contaminates all the previously stored memories with a little Gaussian noise. Below capacity repeated training will drive the Gaussian noise to zero. The tools required to understand this are the central limit theorem and high-school math. You can replace the hard binarization with a softer version like squashing functions but in fact almost any non-linearity will work reasonably well. Hence Extreme Learning Machines = Reservoir Computing = Associative Memory. Since they are all based on some form of LSH or the other. A fast LHS is first to apply a predetermined random pattern of sign flipping to the elements of a vector array of data, then apply the fast Walsh Hadamard transform. This gives a random projection whose output you can binarize to give an effective LSH. Sean O'Connor — Preceding unsigned comment added by 14.162.211.26 (talk) 02:20, 12 March 2019 (UTC)

Suggested Templates
According to WP:VERIFY they classify as primary sources, which are not nearly as reliable as secondary sources. This is an infringement of WP:PSTS. Thus, I'll register here my suggestion to add either or both of the following templates:

namely and. Looking forward to your feedback. Walwal20 talk ▾ contribs 11:54, 4 August 2020 (UTC)
 * Stating here my intention to think about proposing to merge this article with feedforward neural network. When I have the time to offer myself to do it, though. Walwal20 talk ▾ contribs 03:27, 11 August 2020 (UTC)
 * There seems to be some good material on Extreme Learning Machines that is not written by Dr. Huang, so this article could be "saved". Walwal20 talk ▾ contribs 03:29, 11 August 2020 (UTC)