Talk:Deep learning/Archive 1

Article created
I did about two years of work involving Deep Belief Networks. Noticing that there was no article for deep architectures in Wikipedia I made this one and hope to add more soon! Renklauf (talk) Wed Jul 20 06:29:12 UTC 2011
 * Please do! I greatly enjoy this topic, and I think it's the best way to Strong AI! Danuthaiduc (talk) 10:00, 10 August 2011 (UTC)


 * Yeah, but let's follow NPOV and refrain from asserting that this stuff really works. You can't use one researcher's claims about his own work as proof that his ideas are valid (see Reproducibility of results). --Uncle Ed (talk) 15:48, 27 May 2012 (UTC)


 * There is zero technical merit to this article. I suggest merging this with the article on neural networks until "definitions" of "deep" neural networks like this one are seen as laughably pathetic as they are: "A deep neural network (DNN) is defined to be an artificial neural network with at least one hidden layer of units between the input and output layers..."  For crying out loud, people, the addition of at least one hidden lay was the whole idea behind neural networks as revived during the 1980s -- and this after the article's introduction says "ANNs fell out of favor in practical machine learning and simpler models such as support vector machines (SVMs) became the popular choice of the field in the 1990s and 2000s."  That is to say, Deep Learning Neural Networks (as defined in this article) fell out of favor prior to the Deep Learning Neural Networks advancing the field beyond Deep Learning Neural Networks.  Jim Bowery (talk)  — Preceding undated comment added 20:09, 9 May 2014 (UTC)


 * I'm afraid I agree with Jim Bowery -- I'm not sure that deep learning is really a separate field -- it sounds like another term for an existing set of paradigms. I would support a merge. CharlesGillingham (talk) 22:54, 27 September 2014 (UTC)


 * I also agree. As this article currently stands I don't see the difference between Deep Learning and Neural Networks. If there really IS a difference it should be made clear in the article. If there isn't a difference then this article should be merged with an article on Neural Nets. BTW, I also think it's not good style to have comments like this at the top of the article: "Alternatively, "deep learning" has been characterized as "just a buzzword for neural nets" Is it "just a buzzword for neural nets" or isn't it? That seems to me something the article should take a stand on and not just be weasely as it is now. --MadScientistX11 (talk) 18:29, 21 October 2014 (UTC)


 * As far as I can tell, "deep learning" refers commonly to the current neural net revival, which started when researchers figured out how to train nets with more than (say) three hidden layers, ca. 2006. But since then, as you can see in this article, anything and everything neural net-related has been sold as "deep learning", including single-layer networks. Q VVERTYVS (hm?) 19:11, 21 October 2014 (UTC)


 * The difference between deep learning and neural networks is pretty clear in the literature and in this article. A neural network (in machine learning) is a a particular kind of architecture or mathematical model for transforming inputs to outputs. Deep learning is the name of a class of algorithms, based on representational methods and unsupervised pre-training of layers, for training models with deep architectures, that is models with multiple layers of computational units between input and output. Don't mistake a training algorithm for an architecture. There are many different ways to train artificial neural networks: see Artificial neural network for some examples. Some of theses learning algorithms, such as backpropagation and expectation-maximization, have their own standalone articles.
 * Deep learning as a training algorithm is more than notable enough to have its own standalone article. Criticism of article content is welcome, but it is far better to improve the content using reliable sources than to derisively declare "It's all crap!" and attempt to delete it. "It's all crap!" is also a non-neutral stance on the topic. It is clear from the literature that there are proponents of deep learning methods and undeniable successes in the competitions, there are detractors of these methods and there are people fed up with all the hype. Summarizing with due weight on all of these is the way to go. --Mark viking (talk) 19:46, 21 October 2014 (UTC)


 * deep learning is not an algorithm at all. It's an umbrella term for various models, trained with novel but still very different algorithms, ranging from the supervised to the unsupervised. I actually wrote a page about the deep belief network and its training algorithm; but convolutional nets are trained in quite different ways. (Backpropagation is involved practically everywhere, though.) Q VVERTYVS (hm?) 20:04, 21 October 2014 (UTC)
 * As I said above, Deep learning is the name of a class of algorithms. Feedforward neural nets, restricted Boltzmann machines, deep neural networks and deep belief networks are all mathematical models. How the models are trained, that is, how model parameters are chosen to optimize some loss function, is another topic entirely. People had looked at deep neural network architectures before deep learning techniques were developed, but it was the deep learning techniques--unsupervised pre-training, layer by layer--that made deep networks more practical at the time. With today's GPU based algorithms, the need for pre-training has lessened. The hype surrounding deep learning has muddied the concept to point that people like Michael Jordan claim it is just a synonym for neural network. But the backlash from current hype doesn't change the historical importance of deep learning algorithms in generating interest and good results in deep neural network models or representational learning. Schmidhuber's review has a good discussion of these issues. --Mark viking (talk) 21:55, 21 October 2014 (UTC)

Ok, agreed. Let's try to clean up the article before trying to merge it. Q VVERTYVS (hm?) 09:14, 22 October 2014 (UTC)


 * Sounds good. I will chip in the next few days. --Mark viking (talk) 17:26, 23 October 2014 (UTC)


 * Good job so far guys - it's a devilish mess to unravel! That said, I have to agree with most of the negative comments - this still needs a lot of work before being reference quality. Deep learning is a very young (immature) field at a philosophical level (probably because progress is driven mostly by empirical engineering practices) and we must have faith that it will become more rational as it matures. E.g., the state of taxonomic analysis is pitiful (as substantiated out by the various comments here).  As a step in the direction of rigorous philosophical treatment, I have added a fairly brief and by no means exhaustive section on *interpretation*. Clearly, the logic set forth in the added 'interpretation' section is not compatible with the subsequent (hand-wavy) 'definitions' section but I haven't got time to clear that section up at the moment. We could also do with a 'Biological Interpretation' section as well (to round things out) - if there is one. We could also do with more examples of the insights and progress that was specific to each interpretation.  Furthermore, I do not support a merger with 'neural networks'. It is premature to conclude that neural networks are inherently 'deep learners' (see the 'DSP interpretation' for argumentation).  In order for this field to be taken seriously, we need to step up the rigor fellas. Qazwsxedcqazws (talk) 07:54, 1 October 2015 (UTC)
 * How is the clean up progressing? IMHO, "Deep Learning" as a term is likely to mislead the not-so-educated audience about the "depth" of learning: the general audience does not understand that the only "deep" thing about "deep learning" is the metric depth of the neural network, and there is nothing else particularly deep about the neural networks or the associated learning methods. An extremely effective way to produce more hype! — Preceding unsigned comment added by 87.92.32.62 (talk) 07:31, 2 March 2019 (UTC)

Deep Learning Artificial Neural Networks
This article was missing a lot of stuff on deep learning neural networks. I tried to improve this a bit, but much remains to be done. (I found the so-called "official site" on deep learning pretty meager, too.) Deeper Learning (talk) 20:35, 7 December 2012 (UTC)


 * You could have a look at deep belief networks and deep autoencoders. p.r.newman (talk) 13:56, 20 August 2013 (UTC)


 * It's said in the article that neural networks with more than one hidden layer could be considered as a deep learning architecture. In this case there is at least two hidden layers (and also 3 non-linear successive transformations). Instead isn't it ONE hidden layer for two non-linear transformations ? — Preceding unsigned comment added by 163.5.218.118 (talk) 01:19, 9 March 2014 (UTC)

Streamlining
One should probably streamline repetitive statements in the introduction and the section on deep neural networks. Isn't deep learning exclusively about neural networks anyway? As far as I know, there is no other successful form of deep learning. Prof. Oundest (talk) 01:56, 11 August 2013 (UTC)

Proofreading note: "set of algorithms"
The article states that deep learning is a "set of algorithms", but it doesn't clearly identify the specific algorithms included in the set. The Transhumanist 01:40, 25 September 2013 (UTC)

How is deep learning differentiated from the family of feature learning methods it belongs to?
The article states: "Deep learning is part of a broader family of machine learning methods based on learning representations." But it doesn't explain how it differs from the rest of the feature learning family. The Transhumanist 01:40, 25 September 2013 (UTC)

"Deep learning" synonymous with "neural networks"?
The article's lead includes a blatant claim that "deep learning" is synonymous with "neural networks": "Deep learning is just a buzzword for neural nets, and neural nets are just a stack of matrix-vector multiplications, interleaved with some non-linearities. No magic there."

- Ronan Collobert

This is potentially extremely confusing, as it may cause readers to wonder why there is a separate article about deep learning. The article does not justify itself with an explanation. That is, it doesn't explain how deep learning is a type of neural networks, rather than being just another name for neural networks in general.

If it is synonymous (as per the claim included in the lead), then the article violates WP:FORK. And so far, the article doesn't make it clear that its coverage isn't a content fork.

Is there any such thing as non-deep learning neural networks? If so, what are they? And, how do deep learning neural networks differ from them? The Transhumanist 01:40, 25 September 2013 (UTC)


 * I agree that this is confusing, but the reason for citing Collobert (one of the foremost practitioners in the current neural nets landscape) is to counterbalance the rest of the article. As I understand deep learning, it's neural nets with more hidden layers than you can train using vanilla backpropagation (because of numerical stability, the required computing time, and/or a lack of labeled training data). See e.g. this talk by Hinton, and note the difference between the title assigned to it by UBC's YouTube moderator and the actual title of the talk. Q VVERTYVS (hm?) 18:22, 25 September 2013 (UTC)


 * Yes that quote was weirdly situated. I added some context and moved it to the last paragraph of the lead. Bhny (talk) 19:10, 25 September 2013 (UTC)


 * I was about to revert your edit, but then I figured that wouldn't solve anything, and I'd just be restoring my POV. Instead, I tagged the whole page as OR. I haven't seen a source that establishes the link between the neocognitron, or for that matter Schmidthuber's work, to the recent trend of "deep learning". I'll admit that, from what I've read, it fits one of the definitions that can be gleaned from it, but as this page stands, there's no WP:COMMON here. Q VVERTYVS (hm?) 19:34, 25 September 2013 (UTC)


 * I clarified a few things concerning the contributions of Fukushima, LeCun, Schmidhuber, Hinton, and others, with references. Schmidhuber's open peer review web page http://www.idsia.ch/~juergen/firstdeeplearner.html is a great resource for references. Its acknowledgments read like a "who is who" of neural networks. Yes deeper (talk) 19:17, 29 November 2013 (UTC)


 * I agree with the Transhumanist and James Bowery above. Don't make it more difficult for readers to find what they are looking for; don't WP:Fork. Who, exactly, uses the term "deep learning"? Certainly not many of the people who's work is described as "deep learning". I think Hinton would say he was working on neural networks or even connectionism. I don't like people being described as being part of research project that they never would have heard of. CharlesGillingham (talk) 23:00, 27 September 2014 (UTC)


 * Amongst others Yoshua Bengio, Goeffrey Hinton and Yann LeCun use the term "deep learning". —Kri (talk) 21:23, 5 February 2016 (UTC)


 * "Deep learning" may be synonymous with "neural networks" if you limit yourself to the neural networks that are used today, which are really "deep neural networks", i.e. neural networks with many hidden layers. For a long time, however, neural networks were very difficult to train unless they were very shallow (basically just one or at the maximum two hidden layers). It is first lately that we have been able to train deep neural networks efficiently, which have made it possible for neural networks to learn much more sophisticated features and abstract concepts, which is key in perception. Hence the buzzword "deep learning". So all neural networks can definitely not be counted as forms of deep learning. —Kri (talk) 20:37, 5 February 2016 (UTC)

Definition and citation
The very first sentence appears to be a definition quoted from the given paper "Representation Learning: A Review and New Perspectives".

Deep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using architectures composed of multiple non-linear transformations.

I didn't find the quote, though. I'd suggest to make it clear how to verify the statement. The paper for your reference: http://www.computer.org/csdl/trans/tp/2013/08/ttp2013081798.html — Preceding unsigned comment added by 87.149.191.206 (talk) 14:54, 30 April 2014 (UTC)


 * It's not a quote. If it were a quote, it would be in quotation marks. Q VVERTYVS (hm?) 19:27, 30 April 2014 (UTC)

Errors shrink exponentially?
The article states:
 *  [Recurrent neural networks] are trained by unfolding them into very deep feedforward networks, where a new layer is created for each time step of an input sequence processed by the network. As errors propagate from layer to layer, they shrink exponentially with the number of layers. To overcome this problem [...]

Should this instead say that the errors increase exponentially? Why would the shrinking of errors constitute a "problem"? AxelBoldt (talk) 23:53, 24 May 2014 (UTC)


 * What I think is meant is the the derivative of the error function in terms of the parameters shrinks as the parameters (weights) are further from the output unit. This is the "vanishing gradient" problem: parts of the net close to the input side receive very small updates, until at some point the derivatives underflow and these parts no longer get any updates. Q VVERTYVS (hm?) 15:27, 25 May 2014 (UTC)


 * Yes, that sounds right. Maybe we can reformulate the error propagation sentence to explain it better? AxelBoldt (talk) 15:04, 26 May 2014 (UTC)

DNN is just an MLP

 * A deep neural network (DNN) is defined to be an artificial neural network with at least one hidden layer of units between the input and output layers

Following this definition, almost all neural nets are deep. The source for this sentence actually describes good old (shallow) multilayer perceptrons, and says "we will see later on that there are substantial benefits to using many such hidden layers, i.e. the very premise of deep learning" (bold in the original), so it actually distinguishes single-hidden layer nets from deep ones. Q VVERTYVS (hm?) 10:31, 23 June 2014 (UTC)


 * It seems to me then that the definition of DNNs as given here in the article is wrong. It sounds rather like a requirement (i.e. it constitutes a superset for DNNs) rather than an actual definition. —Kri (talk) 11:14, 4 October 2014 (UTC)

Use in NLP
just added the section "Natural language processing", which suffers from the definition problem that has plagued this article from the beginning. It states that


 * Neural networks have been used for implementing language models since the early 2000s. Key techniques in this field are negative sampling and word embedding. Deep neural architectures have achieved state-of-the-art results in many tasks in natural language processing.

While word embeddings have had a lot of recent years (more so than when they were first suggested in the 1980s; Elman nets were tried on language tasks from the beginning), they're not typically deep nets. Models such as word2vec are single-layer networks. They handle convolutions, limited recurrence, alright, but why call must they be mentioned in an article about deep learning if they're actually shallow? Or is it true after all that ""deep learning is just a buzzword for neural nets" (Collobert)? Q VVERTYVS (hm?) 13:11, 31 August 2014 (UTC)


 * You're right that word embeddings are not necessarily done by deep networks, but I think their use is a key element in the application of deep learning to NLP and is worth mentioning. And you're also right that there is no clear definition for deep learning, but I think any layered or recursive structure of neural networks can be so termed. Single layer networks are not such.Daniel HK (talk) 08:05, 3 September 2014 (UTC)


 * I do think we should make the link more explicit, then. Maybe add some text to the word embedding page describing how to build a deep net out of them? (I personally have no idea how that's done, but I suppose you can use them for unsupervised pre-training?) Q VVERTYVS (hm?) 12:46, 3 September 2014 (UTC)


 * Schmidhuber's excellent review (reference number 3) uses credit assignment path length as a determiner of whether an architecture deep or shallow in section 3 of the paper. It is the most sensible and reasonably object definition of what deep means that I have seen so far. --Mark viking (talk) 16:15, 3 September 2014 (UTC)


 * , would you care to write up a summary of that definition? Q VVERTYVS (hm?) 08:07, 24 October 2014 (UTC)


 * I added a paragraph describing shallow vs deep learning and (very informally) the idea of the CAP. --Mark viking (talk) 01:30, 26 October 2014 (UTC)

What is a Vector of Pixels?
I noticed reverted an edit where an internet user changed "an image can be represented as a vector of pixels" to "vector OR pixels"  I can understand why the person made that edit and that edit made sense to me in the context (distinguishing different ways to represent knowledge) but I don't understand why it was reverted. I'm probably just not understanding something but I want to make sure the text makes sense. My interpretation of what the first editor did was to distinguish between vector graphics (e.g., the way Postscript works) vs. an array of pixels. But apparently that isn't the idea. But what IS the idea? How can you have a vector of pixels? BTW, I tried googling "vector of pixels" and the things that came up all seemed to be contrasting vector graphics OR pixels. --MadScientistX11 (talk) 16:56, 23 October 2014 (UTC)


 * Take any N by M image of pixels and write it out as an ordered 1-D array of NM elements, forgetting the matrix structure--that is a vector of pixels. I think the point implicitly being made here is that writing an image as a vector of pixels destroys the spatial relationships among the pixels. Close pixels in 2D can be spread far apart in a 1D vector. Those spatial relationships allow for easier discovery of features, like edges, that can be used to form higher level representations of an image. As with much of this article, it could be worded more clearly. --Mark viking (talk) 17:23, 23 October 2014 (UTC)


 * Thanks. That makes sense, thanks for taking the time to explain it. --MadScientistX11 (talk) 17:33, 23 October 2014 (UTC)


 * I rephrased "a vector of pixels" to "a set of pixels"; I hope that will make it somewhat clearer what is actually intended. —Kri (talk) 21:29, 23 October 2014 (UTC)


 * But what goes into a neural net must be a vector, not a set. A set is unordered and doesn't distinguish between two black pixels and three. Q VVERTYVS (hm?) 22:21, 23 October 2014 (UTC)


 * You're right. I thought of the pixels as distinguishable objects, but it's of course just the pixel values that go into the neural network, not the pixels themselves. —Kri (talk) 00:42, 24 October 2014 (UTC)


 * I've tried to clarify a bit further. Maybe we should expand on this example more; IIRC, image classification on raw pixels, without SIFT/SURF/whatever, was one of the first successes in deep learning and can be used to contrast it with the previous state of the art. Q VVERTYVS (hm?) 08:04, 24 October 2014 (UTC)

Limitations of DNNs
Here's an interesting paper by, a.o., Ilya Sutskever, about limitations of DNNs: ways to fool them into making the wrong prediction and, perhaps more interestingly, an empirical evaluation of the "high-level abstractions" that are promised in the lede of this article. It's recent, though, and hasn't received a whole lot of citations yet (16). Perhaps it would still be appropriate to add some of this to the article (but I'm not doing it right now). Q VVERTYVS (hm?) 15:37, 9 December 2014 (UTC)

The relativity of being recent
This article contains 19 occurrences of the word ”recent”. This means that this article will be rotten in, say, two years. The omnipresence of nowness in the article ought to be removed for archival conveniency. gnirre (talk) 09:34, 13 May 2015 (UTC)

Art and AI
My contribution on art and AI was, so I have restored. But I will glad to discuss further here. Ironically, all of this has transpired in the midst of a huge new outbreak of interest in the subject of art and AI -- see http://googleresearch.blogspot.co.uk/2015/06/inceptionism-going-deeper-into-neural.html -- and to which I have added a reference within the Wikipedia article. Synchronist (talk) 02:58, 21 June 2015 (UTC)

stupid sentence
"where they have been shown to produce state-of-the-art results on various tasks." of course, current research generates state of the art. should be described better what the actual achievement is, like eg comparing obejct recognition with humans or give some results or compare with other older methods... — Preceding unsigned comment added by 130.92.162.153 (talk) 08:08, 26 June 2015 (UTC)

Andrew J.R. Simpson self citations
On October 1, 2015, a user by the handle of Qazwsxedcqazws submitted a fairly significant change to the article. Most of these changes are plugging the "work" of one Andrew J.R. Simpson, who is (un)notable in the community for publishing a large volume of short arXiv tech reports citing mainly his own work, with little experimental evaluation and no oversight from the community in the form of peer review. This alone is enough to make me think that this user is Andrew J.R. Simpson himself, as no one else really references his work. Whether or not this is a vanity submission or some shoddy attempt at self-promotion, the works referenced here are not valid scientific contributions, and do not belong in an encyclopaedic article, and should be scrubbed. 81.154.37.40 (talk) 18:32, 11 October 2015 (UTC)
 * Let's take this a step further: Since it is WP policy that it takes WP:OR to interpret primary sources, how about in the same scrub, we remove all content that is not supported by secondary sources (and therefore, in its density and near exclusiveness as source type, violates WP:VERIFY and WP:OR). Not really suggesting that we do this, but I'm using the case of an editor exhibiting clear WP:OR in his application of expertise to the evaluation of AJR Simpson primary sources to make the more general point that we are supposed to be using secondary sources, in the first place, which would disqualify any edits that are adding new content based solely on the selection and analysis of primary sources.


 * As such secondary sources include many fine text books, the result might even be an article that people—general audiences, including youth—can come to and learn something about the subject. (Instead of reading, as I have witnessed, a few sentences in, and shaking heads, and walking away.) 73.211.138.148 (talk) 01:33, 14 July 2016 (UTC). PhD

Why are "deep neural networks" listed as a separate deep learning architecture?
In my opinion, all other of the architectures listed in the article are forms of deep neural networks, as they are all deep architectures and all neural networks. So why is "deep neural networks" listed separately? Isn't "deep neural networks" in fact rather equivalent to "deep learning"? —Kri (talk) 20:54, 5 February 2016 (UTC)


 * There exist other deep, multi-layer, learning systems based on elements like restricted Boltzmann machines, support vector machines, and kernel systems. The first one is considered a neural network by most (but it is not the classic feedforward design), the second by nearly none and the third, it depends on the kernel. --Mark viking (talk) 21:25, 5 February 2016 (UTC)


 * Okay, so why are deep neural networks listed separately to all other deep architectures in this article, of which most do seem like deep neural network architectures to me? Shouldn't these rather be listed as subsections to the Deep neural networks section, not as separate sections? —Kri (talk) 19:12, 21 March 2016 (UTC)


 * So basically, the only deep architecture mentioned in this article that is not a deep neural network is the multilayer kernel machine? —Kri (talk) 19:26, 21 March 2016 (UTC)


 * While deep neural networks are not synonymous with deep learning, I agree that DNNs are the dominant architecture these days. To reflect due weight, perhaps it would be best to refactor the "Architecture" section into two sections: "Deep neural network architectures" and "Other architectures". Then the multilayer kernel machine would go into the Other architectures section. What do you think? --Mark viking (talk) 20:08, 21 March 2016 (UTC)


 * Sounds like a plan. —Kri (talk) 18:12, 22 March 2016 (UTC)


 * Here's a thought: Instead of talking about editor views and opinions and "seem like" statements, why not find a good secondary source that teaches the material, and present it the way that expert does? I don't doubt it may be identical to the way you describe, but even so, it would give confidence to those listening in, that this is an encyclopedia that has rules and follows them, and not the blog of a few AI/deep learning devotees. 73.211.138.148 (talk) 01:24, 14 July 2016 (UTC)

Editing the copyediting
Pgan002, I appreciate your work in approving the readability of one of my contributions to this article; however, some of your streamlining removed some complexity that I was pretty much forced to add in response to a somewhat reasonable objection that my original version made an unwarranted claim about the appeal of the subject images; so I have undone a little bit of your work -- but also made some additional changes in its spirit of improving readability. See following for history:

https://en.wikipedia.org/w/index.php?title=Deep_learning&diff=667876519&oldid=667859537

https://en.wikipedia.org/w/index.php?title=Deep_learning&diff=672952465&oldid=672393327

Regards! Synchronist (talk) 18:53, 7 February 2016 (UTC)

Spam/External Links
Editors, there have been some spam / external links placed here and then added back after being removed. External links should not be in the body of the article and should be kept to a minimum in the EL section. More can be found at WP:EL. If you feel there should be an exception for links in the body of this article, head over to External_links/Noticeboard.

Occasionally, I'll see external links in the body of an article and they are links in support of claims made in the article, accidentally placed there by a new editor. These are absolutely something that should be converted to a cite (example). Citations should not be bare www.productname.com or github links, as they are not references and are definitely not WP:RELIABLESOURCES. You'll note I removed some of those from article because they did not meet that bar, either.

The only good way to get a link to www.productname.com on Wikipedia is the Official Website link at the bottom of an article *about ProductName*. Which ties into WP:WTAF. These non-Wikipedia-notable software libraries (those without Wikipedia articles and wikilinks) arguably shouldn't even be in this list until they have an article about them and a blue wikilink to them. Thanks, Stesmo (talk) 03:25, 23 February 2016 (UTC)

Criticism and comment on "Criticism and comment"
The section "Criticism and comment" is more comment than criticism, and much of the comment is off-topic for an article of this kind (e.g., about artistic judgment). Rewriting with better focus and ~1/4 length would be a service. 86.135.176.227 (talk) 14:51, 30 May 2016 (UTC)

Pseudo Citation in Definition
The following line is Pseudo: Here are a number of ways that the field of deep learning has been characterized. For example, in 1986, Rina Dechter introduced the concepts of first order deep learning and second order deep learning in the context of constraint satisfaction.[14] Later, deep learning was characterized as a class of machine learning algorithms that[1](pp199–200)

If you look at the original citation the author citates wikipedia. So there is a cycle with no original source. — Preceding unsigned comment added by Simon Pickert (talk • contribs) 13:01, 1 June 2016 (UTC)


 * ✅ The earlier citation of pages 1999f of the Microsfot paper was indeed pseudo/circular. Removed. 73.211.138.148 (talk) 02:33, 14 July 2016 (UTC)

It's still not done. The citation [1](pp199–200) which is "Deep Learning Methods and Applications" from Li Deng and Dong Yu, is still the main citation. The author concludes from 5 definitions where 3 defs are from wikipedia, 1 from a random blog and 1 is one by their self. There are no legit scientific citation used for the created definition which wikipedia is using. — Preceding unsigned comment added by Simon Pickert (talk • contribs) 12:53, 20 July 2016 (UTC)

Please consider and compare before reverting
Here is current prose from the lede, which is supposed to be the most accessible part of the article: And here is the way another teaching source opens in the same subject [where words like graphs and processing layers do not appear for some time]:
 * "Deep learning (also known as deep structured learning, hierarchical learning or deep machine learning) is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using a deep graph with multiple processing layers, composed of multiple linear and non-linear transformations… / Deep learning is part of a broader family of machine learning methods based on learning representations of data. An observation (e.g., an image) can be represented in many ways such as a vector of intensity values per pixel, or in a more abstract way as a set of edges, regions of particular shape, etc. Some representations are better than others at simplifying the learning task (e.g., face recognition or facial expression recognition...). One of the promises of deep learning is replacing handcrafted features with efficient algorithms for unsupervised or semi-supervised feature learning and hierarchical feature extraction."
 * "Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling or clustering raw input. The patterns they recognize are numerical, contained in vectors, into which all real-world data, be it images, sound, text or time series, must be translated. / Neural networks help us cluster and classify. You can think of them as a clustering and classification layer on top of data you store and manage. They help to group unlabeled data according to similarities among the example inputs, and they classify data when they have a labeled dataset to train on. (To be more precise, neural networks extract features that are fed to other algorithms for clustering and classification; so you can think of deep neural networks as components of larger machine-learning applications involving algorithms for reinforcement learning, classification and regression.):

Please consider a major article overhaul, to make it readable by general audiences. Far too many maths and computing articles are simple near illegible tomes of WP editors talking to themselves, while interested readers shake their heads and walk away. 73.211.138.148 (talk) 01:18, 14 July 2016 (UTC)

Text removed as being pseudo-circular in its sourcing (back to Wikipedia)
The following text is moved here, for reintroduction if it can be sourced to sources other than Wikipedia. The cited two author Microsoft paper lists a series of definitions on pp 199-200, most of which are drawn from Wikipedia. This was reprehensible scholarship by these Microsoft employees, and their paper represents an unacceptable reference, as it introduces a circular, pseudo-citation, wherein WP is citing itself (though indirectly). See the MS paper, cited: "Later, deep learning was characterized as a class of machine learning algorithms that
 * use a cascade of many layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input. The algorithms may be supervised or unsupervised and applications include pattern analysis (unsupervised) and classification (supervised).
 * are based on the (unsupervised) learning of multiple levels of features or representations of the data. Higher level features are derived from lower level features to form a hierarchical representation.
 * are part of the broader machine learning field of learning representations of data.
 * learn multiple levels of representations that correspond to different levels of abstraction; the levels form a hierarchy of concepts."

This information can be returned if it can be found in other sources. Please don't return this paper to the article. A full page of numbered definitions, all from this article, cannot be allowed as a source for this article. All credibility of this primary source is gone. 73.211.138.148 (talk) 02:43, 14 July 2016 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 1 one external link on Deep learning. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20150703064823/http://googleresearch.blogspot.co.uk/2015/06/inceptionism-going-deeper-into-neural.html to http://googleresearch.blogspot.co.uk/2015/06/inceptionism-going-deeper-into-neural.html

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.— InternetArchiveBot  (Report bug) 02:39, 10 December 2016 (UTC)

Artificial Neural Networks section's history
The "Artificial Neural Networks" section begins with the following:


 * Some of the most successful deep learning methods involve artificial neural networks. Artificial neural networks are inspired by the 1959 biological model proposed by Nobel laureates David H. Hubel & Torsten Wiesel, who found two types of cells in the primary visual cortex: simple cells and complex cells. Many artificial neural networks can be viewed as cascading models[37][38][39][76] of cell types inspired by these biological observations.

Wouldn't McCulloch and Pitts work pre-date a 1959 biological model pretty substantially (McCulloch, W. and Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5:115–133)? Even the original Perceptron algorithm predates 1959. Perhaps 1959 is the date the prize was awarded or something, but this seems a bit off to me. Criticny (talk) 19:56, 11 January 2017 (UTC)

The Reference to the Mordvintsev/Olah/Tyka "Inceptionism" Post
There has been some concern that a focus in the "Criticism and Comment" section on the connection between deep learning and artistic creativity is misplaced; however, events are continuing to demonstrate that this connection is perhaps both profound and vital: the web page welcome ( https://magenta.tensorflow.org/welcome-to-magenta ) to the Google "Magenta" initiative, for example, specifically mentions the 17 June 2015 Mordvintsev/Olah/Tyka Google Research Blog post on the CNN image generation technique which they originally dubbed "Inceptionism" and which has since become knows as "DeepDreaming".

Indeed, that post may prove to be very much like the 25 April 1953 Watson/Crick announcement in "Nature" of a possible structure for DNA: a somewhat informal communication of enduring significance; and as such -- and given also that I am the Wikipedia contributor who added the reference to the "Inceptionism" post to the Deep Learning article in the first place -- I've felt justified in keeping an eye on it.

I was therefore very interested to notice that on 10 December of this past year, an "ArchiveBot" added to that reference a link to the Internet Archive version of the "Inceptionism" post -- and with an indication that the original link was dead. Well, that link is currently very much alive, and so -- in the interest of keeping things tidy -- I've restored the reference to its former state; however, I've also noticed something else of great interest: somewhere along the line, the Research Blog post as maintined by Google has lost all, or almost all, of its original comments. These numbered almost a thousand, and to which point I can attest via a screen capture created shortly after my own such comment -- the 993rd! -- became part of the dialogue:



Can anyone shed light on either of these "happenings" -- the apparently temporary loss of the original link, and/or the fate of the original comments? Synchronist (talk) 02:40, 22 January 2017 (UTC)

Greek translation
βαθυμάθηση < βαθυ- (deep) + μάθηση (learning) — Preceding unsigned comment added by 2A02:587:4100:DF00:EC9E:85A9:1A6B:4639 (talk) 18:27, 7 April 2017 (UTC)

The introductory paragraph to this article is a f***ing trainwreck.
We can do better than this. Consider the viewpoint of a layperson, reading this article. The first sentence should be clear and concise, instead it's obtuse bullet points. And yet, somehow, astonishingly, the bullet points in the first sentence are also themselves sentences that meaninglessly describe things like the "(unsupervised) learning of multiple levels of features or representations of the data." It's almost hyperbolically bad. It's a caricature. The point of an introductory section is not to present a superficial analysis of the field using as many buzzwords as possible; it's to maximize the amount of information relevant to the topic delivered to the reader who we assume is a layperson. Machine learning is not a "field of learning representations of data" any more than writing fiction is an "act of attributing intention to synthetic agents." Don't burden an inexperienced reader with useless assertions before they're even equipped to understand the words you're using. I don't know how to write this introduction, but I at least know enough to say that this one is absolute shit.


 * I agree. It's particularly bad, because it is copying what later appears under the Definition section. It doesn't make sense to have all of that information in the intro, and perhaps it would be best to just remove it. Pirx123 (talk) 23:13, 9 June 2017 (UTC)


 * This IS difficult. I made the copying. I still hold that copying was an improvement, albeit from a low level. Complaints like this do not improve the situation. Better spend the time making a little improvement. --Ettrig (talk) 09:08, 10 June 2017 (UTC)


 * Really, the first sentence is all that is needed. This is just an emphasis on many layers. Otherwise it is just an artificial neural network. The bulk of the explanations should be in the article about neural networks. The reason the name Deep learning caught on so strongly is that researchers who reported spectacular achievements and increases in capacity used this name. --Ettrig (talk) 15:24, 10 June 2017 (UTC)

Rewrite
This article reads like multiple articles squished together. The content is highly redundant. E.g., historic stuff is repeated in multiple places. It's no wonder, given the ridiculous length of the piece. Nobody can read the whole thing. In sum, it needs a rewrite that moves most of the content into other, more focused pieces (many of which already exist). I am doing a superficial copyedit to familiarize myself with the topic. After that, I will propose a new approach. Feedback encouraged! Lfstevens (talk) 19:04, 10 June 2017 (UTC)
 * Agreed. As I wrote above: The only difference between Deep learning are Artificial neural networks, except for the grammatical difference. And except that DL is restricted to ANN with manyish layers. That clarification plus some history of the use of the term should be sufficient. --Ettrig (talk) 19:29, 10 June 2017 (UTC)
 * OK. I propose to dramatically shorten this article, by moving most of the technical details to Artificial neural network. No need to duplicate/maintain the two in synch. Instead this article should focus on learning and applications, while the other focuses on how it works. Deep reservoir, deep belief, deep etc., will stay here. Lfstevens (talk) 00:39, 19 June 2017 (UTC)

Dear Lfstevens: in reference to your second set of June 11 edits to the "Criticism and comment" section -- and since you have invited feedback -- I must admit that the original text needed to be streamlined somewhat; however, I think that the current level of pruning eliminates an important point regarding neural networks, and one that needs to be emphasized in several contexts, namely, that they are capable of performing seemingly sophisticated discriminations and manipulations which were formerly deemed the exclusive property of top-level, rule-based processing.

This same pruning, moreover, has eliminated the "yes, but" transition from the previous paragraph, which seeks to establish exactly the opposite point, i.e., that there are many high level functions which neural nets are incapable of performing.

I am wondering, therefore, if you would be distraught if I edited the paragraph in question yet again, and by way of accomplishing all of our goals?Synchronist (talk) 04:03, 20 June 2017 (UTC)
 * Thanks for noticing! You are of course free to edit as you like, but may I suggest we hash out language on this talk page instead? I've really only begun my work on this monster, and appreciate your insights. Lfstevens (talk) 03:36, 21 June 2017 (UTC)
 * Since you seem to have just invited an even broader perspective, let me step away for the moment from a selfish focus on my own little piece of this article and talk about the big picture -- but in respect to which I will be reaching the same overall conclusion.


 * Based on my brief acquaintance with your own talk page, you are a much-respected senior editor -- and your idea of moving much of the material in "Deep Learning" to the more fundamental article on "Artificial Neural Networks" has been a sensible initial approach.


 * However, as someone who has been closely involved with the topic of deep learning and the article itself for some time, let me make two observations: 1) the deep learning family of alogorithms has been the result of an especially long and complicated evolution (in conputer science terms) in which the basic principle of simulating the behavior of neurons via software/and or hardware was but a starting point; and 2) although the article "Artifical Neural Networks" has a ten year headstart on "Deep Learning", the latter is some 2.5 times the former in size; rivals it in terms of current viewership; and surpasses it in terms of recent (past year) editing activity.


 * And so what does this tell us? It tells us that -- as with all evolution -- an essentially new thing has come into being, and with which a huge army of researchers and users closely identifies; i.e., although the "Deep Learning" article certainly needs a lot of work, it might be a mistake to think that it can be subordinated to another article.


 * And here let me provide a perhaps imperfect analogy from science history: yes, at a theoretical level, all of chemistry can be reduced to physics -- but no one would dream of attampting to pack the entire practice of chemisty into a physics suitcase.


 * And there is also this final point, and which, again, I share with you only because so requested: the family of deep learning/deep neural net algorithms continue to experience a remarkable fluoresence -- so it is inevitable that an up-to-date Wikipeida article will be rather messy. (But still, agreed, the basic structure needs improvement!) Synchronist (talk) 17:16, 21 June 2017 (UTC)

Adding a few key linkages to "Crticism and Comment" paragraph while preserving overall shortened structure.Synchronist (talk) 14:29, 1 July 2017 (UTC)
 * My thinking has evolved somewhat. I am now envisioning DL as more application-oriented, while ANN is more tech-oriented. Thus I moved the laundry list. DL is now fairly short, although I see some additional fat to suck out in the definition section. Further, I expect to parcel out much of what is in ANN into the more specific articles that already repeat much of its content. My next move is to move duplicative stuff out of ANN into RNN.
 * Another point is that a lot of the breathless contest stuff happened years ago, with no updates. I don't know where to look to refresh it, but in a fast-moving field we should be current.

Deep Learning for Natural Language Processing (NLP)
The article says. "Deep learning architectures such as deep neural networks, deep belief networks and recurrent neural networks have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation and bioinformatics where they produced results comparable to and in some cases superior to human experts.[2]" 1) There is only one reference the performance in computer vision 2) The results for deep learning apllied to NLP are not as outstanding as they are for other fields. In fact the results are quite lacking behind those of traditional methods. See: Deep neural network language model research and application overview, Deep Learning for Natural Language Processing and this blog article. There is just a possibility that there are some breakthroughs coming up. See: Deep Learning applied to NLP — Preceding unsigned comment added by Pkiesele (talk • contribs) 12:47, 5 July 2017 (UTC)

Origin of term
The page currently includes the text:
 * The use of the expression "Deep Learning" in the context of ANNs was introduced by Aizenberg and colleagues in 2000.

However, if you preview the book and search for "deep learning", you see that the actual use is unrelated (emphasis mine):
 * 1960-s - intensive development of the threshold logic, initiated by previous results in perceptron theory. A deep learning of the features of threshold Boolean functions, as one of the most important objects considered in the theory of perceptrons and neural networks.

This just seems like using "deep" to mean "insightful", and is unrelated to the sense as used in this article (multiple hidden layers), so I will remove it. Instead, the current term appears to originate with Hinton et al. "A fast learning algorithm for deep belief nets" (2006).
 * —Nils von Barth (nbarth) (talk) 02:00, 10 July 2017 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 2 external links on Deep learning. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20150228225709/http://www.ncats.nih.gov/news-and-events/features/tox21-challenge-winners.html to http://www.ncats.nih.gov/news-and-events/features/tox21-challenge-winners.html
 * Corrected formatting/usage for http://artent.net/2015/03/27/art-and-artificial-intelligence-by-g-w-smith/

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 23:13, 7 September 2017 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified one external link on Deep learning. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20151010204407/http://deeplearning.cs.cmu.edu/pdfs/Cybenko.pdf to http://deeplearning.cs.cmu.edu/pdfs/Cybenko.pdf

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 23:33, 11 January 2018 (UTC)

Overlap with Artificial neural network
Deep learning is a subset of the subject Artificial neural networks. Consequently everything in this article would also fit in that article (if it fits here, that is). This article is written almost without considering this. Much of this article is even about general aspects of ANNs.

Here is a citation from the established textbook on the subject (Deep Learning by Goodfellow, Bengio and Courville, MIT Press, 2016, p 13): "Some of the earliest learning algorithms we recognize today were intended to be computational models of biological learning, that is, models of how learning happens or could happen in the brain. As a result, one of the names that deep learning has gone by is artificial neural networks (ANNs)."

Actually, this book is available on the web here.

I therefore suggest that this article is cut down dramatically, but definitely not deleted. The material that is removed should then be moved to Artificial neural network if it is not already there.

This has been discussed before. It is about time it is done. --Ettrig (talk) 18:06, 6 February 2018 (UTC)

"Decision Stream" Editing Campaign
This article has been targeted by an (apparent) campaign to insert "Decision Stream" into various Wikipedia pages about Machine Learning. "Decision Stream" refers to a recently published paper that currently has zero academic citations. The number of articles that have been specifically edited to include "Decision Stream" within the last couple of months suggests conflict-of-interest editing by someone who wants to advertise this paper. They are monitoring these pages and quickly reverting any edits to remove this content.

Known articles targeted:
 * Artificial intelligence
 * Statistical classification
 * Deep learning
 * Random forest
 * Decision tree learning
 * Decision tree
 * Pruning (decision trees)
 * Predictive analytics
 * Chi-square automatic interaction detection
 * MNIST database  — Preceding unsigned comment added by ForgotMyPW (talk • contribs) 17:36, 2 September 2018 (UTC)

BustYourMyth (talk) 19:16, 26 July 2018 (UTC)


 * Dear BustYourMyth,


 * Your activity is quite suspiciase: registration of the user just to delete the mention of the one popular article. Peaple from different contries with the positive hystory of Wikipedia improvement are taking place in removing of your commits as well as in providing information about "Decision Stream".


 * Kind regards,
 * Dave — Preceding unsigned comment added by 62.119.167.36 (talk) 13:34, 27 July 2018 (UTC)


 * Web searching seems to suggest that term "becision stream", in the sense used in the paper, seems to have very little uptake. Without further evidence, I can't see that it meets the threshold for inclusion. -- The Anome (talk) 13:49, 27 July 2018 (UTC)

I asked for partial protection at WP:ANI North8000 (talk) 17:08, 27 July 2018 (UTC)

Semi-protected edit request on 14 August 2018
Within the sub-section Deep Neural Networks, under section Neural Networks, please link the term Computer Vision (last line) to its respective page in Wiki. Devilish abhirup (talk) 02:50, 14 August 2018 (UTC)
 * Red information icon with gradient background.svg Not done: Generally a link should appear only once in the article, as per WP:MOSLINK. Computer Vision is linked in the lead and in the section 'Image recognition' just under the title. regards,  DRAGON BOOSTER   ★  09:38, 14 August 2018 (UTC)

A Commons file used on this page has been nominated for speedy deletion
The following Wikimedia Commons file used on this page has been nominated for speedy deletion: You can see the reason for deletion at the file description page linked above. —Community Tech bot (talk) 14:07, 18 November 2019 (UTC)
 * Signal triangulation.png

Improving the text describing the first implementation of DNNs using back propagation
The text currently reads "In 1989, Yann LeCun et al. applied the standard backpropagation algorithm, which had been around as the reverse mode of automatic differentiation since 1970,[35][36][37][38] to a deep neural network with the purpose of recognizing handwritten ZIP codes on mail."

Some of us were lucky enough to get to know Geoff Hinton in 1985 and start work in this area immediately afterwards. It was obvious to us that deep learning was possible, indeed that the basis of the recurrent neural network. I don't know if anyone else did this, but my very first implementation was deep and so preceeded Yann's by three years, hence the text as it stands is correct, it's just not referencing the first use of deep learning using back propagation (which is what it purports to do).

My first publication was "A. J. Robinson. Speech recognition with associative networks. Master's thesis, Cambridge University Engineering Department, Trumpington Street, Cambridge CB2 1PZ, UK, August 1986.", but that's not currently available online. The first online one is https://www.academia.edu/30351853/The_utility_driven_dynamic_error_propagation_network which is from 1987 (which is still two years ahead of what the article claims).

I've obviously got a conflict of interest here (in as much as anyone cares about the work done in the 80's), so what's the best way to take this forward? Should I propose wording that includes the prior history, or is that best done by others. Should I get the first reference online first?

I'll get my first publication online and then come back to this page. Hopefully someone here is as old as I am and remembers the first years.

Tony

DrTonyR (talk) 18:26, 26 December 2019 (UTC)

Delete "Shallowing deep neural networks" section?
This is not terminology or a technique that is commonly used in the field of machine learning. The reference given for introduction of the term is to an article in a low impact factor journal, with a citation count of only five. JaschaSD (talk) 05:29, 21 April 2020 (UTC)
 * Just a minor remark: The impact factor (IF) 2018 of IEEE Transactions on Pattern Analysis and Machine Intelligence is 19.42. Is it low or high impact?Agor153 (talk) 13:00, 22 April 2020 (UTC)
 * If I may play devils advocate to help sort this out.  The concept of shallow vs. deep networks seems to be embedded throughout the article. Could you not consider "shallowing" to be a generic description of making a deep one into a shallower one rather than expecting/requiring it to be a commonly used term?  Sincerely, North8000 (talk) 12:48, 21 April 2020 (UTC)
 * There is a common and widely used term for condensing a deeper network into a shallower one, which is distillation: scholar link. It would be appropriate to mention distillation in a deep learning article. "Shallowing" however seems to refer to a very specific pruning+distillation scheme proposed in a single low-impact article. JaschaSD (talk) 14:54, 21 April 2020 (UTC)
 * I can't confirm but that sounds good. Suggest editing the article accordingly.  North8000 (talk) 16:57, 21 April 2020 (UTC)

Google Brain referred to in 2009, but only formed in 2011
The Google Brain page claims it formed in 2011, so I don't know what this is referring to. Veedrac (talk) 15:34, 21 May 2020 (UTC)
 * Looks like it was introduced in this edit by User:Isingness. Rolf H Nelson (talk) 04:41, 22 May 2020 (UTC)
 * The Economist article refers to Ng's actions in 2011, not 2009. The venturebeat article doesn't say 2009 either. Feel free to be WP:BOLD and nuke or modify material like that. Rolf H Nelson (talk)

neuromorphic hardware
thare should be a section on neuromorphic hardware and its potential acceleration of deep neural networks. — Preceding unsigned comment added by RJJ4y7 (talk • contribs)

Image
I support the inclusion of this image. Benjamin (talk) 11:46, 13 June 2020 (UTC)



Poster child for the reference spamming here
Someone rightly just tagged the "interpretations" section for too many references. The first sentence is really a poster child for the reference spamming here. A short vague sentence had 13 references! So far I reduced it to 11. North8000 (talk) 11:59, 20 June 2020 (UTC)

Year of first published learning algorithm - 1965
The book "Cybernetics and Forecasting Techniques" was first published in 1965 in Kiyv, not in 1967. In 1967 was published English translation. https://cds.cern.ch/record/209675/export/hx?ln=en