We wish to build an agent that is embedded in its environment. The agent can take actions that change the underlying state of the environment, and can make observations of the environment’s state. In general, the observations will not reveal the true state of the environment: they will be noisy, and many underlying environmental states will look the same. The agent will also have a special scalar input signal, called reward, which is correlated with the underlying state of the environment. The agent’s goal is to act in such a way as to gain a large amount of reward over time. It will have to learn about its environment in order to learn how to behave in a valuable way.