User:Mmadondo/sandbox

Model-based (reinforcement learning)

In Reinforcement Learning, a model-based algorithm is one that utilizes the transition probability distribution (and the reward function) associated with the Markov decision process (MDP), which, in RL, represents the problem to be solved, in order to estimate the optimal policy. Through interactions with the environment, an agent builds an estimate of the transition probability distribution (and reward function), collectively known as the "model". This model enables the agent to predict the dynamics of the environment. In the alternative model-free approach, the agent bypasses learning the "model" altogether in favor of learning a control policy directly.