Talk:Temporal difference learning

General
Page seems unnecessarily complex - finding it hard to understand, even knowing underlying terms. Am I alone in this? Perhaps this needs a few images / figures to compliment learning? Dm1911 (talk) 21:26, 26 May 2015 (UTC)

n-step TD and TD-lambda
Will be seriously re-workign much of this article to be more understandable and legible. Priority will be to explain (with examples) TD(0), n-step TD and TD(lambda).

Help / Talk appreciated ! Dm1911 (talk) 10:20, 29 May 2015 (UTC)

External links modified
Hello fellow Wikipedians,

I have just added archive links to 1 one external link on Temporal difference learning. Please take a moment to review my edit. If necessary, add after the link to keep me from modifying it. Alternatively, you can add to keep me off the page altogether. I made the following changes:
 * Added archive http://web.archive.org/web/20131116084228/http://www.cs.colorado.edu:80/~grudic/teaching/CSCI4202/RL.pdf to http://www.cs.colorado.edu/~grudic/teaching/CSCI4202/RL.pdf

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.—cyberbot II  Talk to my owner :Online 14:55, 23 March 2016 (UTC)

broken link
Sutton, R. S.; Barto, A. G. (1990). "Time Derivative Models of Pavlovian Reinforcement" (PDF). Learning and Computational Neuroscience: Foundations of Adaptive Networks: 497–537. 156.40.255.18 (talk) 15:31, 22 September 2023 (UTC)

Lacking description
This article is lacking a (non-mathematical) description of how the algorithm works. 109.49.139.84 (talk) 13:44, 22 October 2023 (UTC)

Policy iteration
As a mergist Wikipedian, I believe we should add some section here about policy iteration. SpiralSource (talk) 13:22, 18 April 2024 (UTC)