Some Explorations in Reinforcement Learning Techniques Applied to the Problem of Learning to Play Pinball

Authors

Nathaniel Scott Winstead

Track:

Contents

Downloads:

Abstract:

Historically, the accepted approach to control problems in physically complicated domains has been through machine learning, due to the fact that knowledge engineering in these domains can be extremely complicated. When the already physically complicated domain is also continuous and dynamical (possibly with composite and/or sequential goals), the learning task becomes even more difficult due to ambiguities of reward assignment in these domains. However, these continuous, complicated, dynamical domains can effectively be modeled discretely as Markov Decision Processes, which would suggest using a Temporal Difference learning approach on the problem. This is the traditional method of approaching these problems. In Temporal Difference learning, the value of a discrete action is defined to be the difference in some value (usually an expected reward) between the current state and and its predecessor state. However, in the problem of playing pinball, the traditional Temporal Difference methods converge slowly and perform poorly. This leads to the addition of knowledge engineering elements to the traditional Temporal Difference methods, which was previously considered difficult to do. However, by making straightforward, simple changes to the basic Temporal Difference algorithm to incorporate knowledge engineering I was able to both speed convergence of the algorithm, and greatly improve performance. Composite and/or sequential tasks may similarly be modeled as Markov Decision Processes, which again suggests the use of Temporal Difference learning. However, applying Temporal Difference learning to composite and/or sequential continuous, dynamical tasks has been traditionally viewed as a burdensome task, involving complex changes to the basic learning architecture. Again, one goal was to keep the learning architecture simple, despite the complexity of the environment. Starting with the existing composite Temporal Difference methods, I have constructed a new composite technique which maintains the elegance and simplicity of the Temporal Difference architecture, while enabling for learning of composite tasks.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.