Talk:Q-learning

lacking citations
The article makes several claims about Markov decision processes claims like "It has been proven that for any finite MDP" but offers no proof, explanation or citations for this.. — Preceding unsigned comment added by 69.191.178.36 (talk) 02:47, 2 June 2015 (UTC)

Convergence
The article says that "The convergence proof was presented later by Watkins and Dayan", however, it does not explain what exactly is meant by 'convergence' - what converges, to what, and in what conditions? --Erel Segal (talk) 16:49, 23 December 2012 (UTC)

Convergence means to an optimal policy. There are a lot of details of the reinforcement learning model not included in the article currently, but this is a class of learning algorithms which are unsupervised in the sense that the learning agent has a set of states it can exist in, a set of actions that it can take in each state, and a set of rewards it earns following each action that tells it how good the action was. The policy is the function that gives the action to take in each state. When there are several possible actions in a state, the policy is a probability function on the actions, i.e. the learning agent might select any of the actions available in the given current state, but some with higher probability than others. In turn, the rewards associated with a given state-action pair might be probabilistic as well. A policy is optimal if, for any state the agent finds itself in, the expected total value of rewards over all time, starting from that state, are the maximum achievable.

Some of the citations, such as [4], point to dead links. I couldn't find a link to correct it to. 108.35.116.197 (talk) 17:11, 28 November 2013 (UTC)

Variants
Would be good to have more details on variations of the Q-algo (with references) Dm1911 (talk) 20:52, 26 May 2015 (UTC)


 * I agree. Like R-learning for example, which is a discount-free variant of Q-learning. —Kri (talk) 17:54, 17 September 2016 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 3 one external links on Q-learning. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive http://web.archive.org/web/20050806080008/http://www.cs.ualberta.ca:80/~sutton/book/the-book.html to http://www.cs.ualberta.ca/%7Esutton/book/the-book.html
 * Added archive http://web.archive.org/web/20080529074412/http://citeseer.comp.nus.edu.sg:80/352693.html to http://citeseer.comp.nus.edu.sg/352693.html
 * Added archive http://web.archive.org/web/20150131172946/http://toki78.github.io:80/ to http://toki78.github.io

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.— InternetArchiveBot  (Report bug) 16:15, 21 July 2016 (UTC)

When was Q-learning first described?
There should be some reference to the paper in which Q-learning was first described, if such exists. So that it's possible to see how old the method is, and read more details about it. —Kri (talk) 17:57, 17 September 2016 (UTC)


 * Such a reference is already present, in the section titled "Early study". Perhaps this short, two-sentence sentence should be merged into the introduction? —50.181.176.188 (talk) 02:58, 30 December 2016 (UTC)

Patent on Deep-Q-Learning
Should this page not mention that some aspects of Deep-Q-Learing are patented by Google? Otherwise this might be a problem for some persons ... see https://patents.google.com/patent/US20150100530 and https://www.reddit.com/r/MachineLearning/comments/3c5f5j/google_patented_deep_qlearning/ — Preceding unsigned comment added by 84.63.193.140 (talk) 11:43, 4 April 2018 (UTC)

Usage of abbreviation DQN
The article uses the abbreviation "DQN" but doesn't explain it. — Preceding unsigned comment added by 131.188.3.226 (talk) 19:09, 24 September 2019 (UTC)

Selecting Actions
I think it would be helpful to give some information on techniques to select actions. Currently no Q-learning algorithm is fully specified since there is no explanation of how actions are selected during learning. — Preceding unsigned comment added by 205.132.0.41 (talk) 20:16, 31 May 2020 (UTC)