Proceedings:
Book One
Volume
Issue:
Proceedings of the International Conference on Automated Planning and Scheduling, 31
Track:
Main Track
Downloads:
Abstract:
In Reinforcement Learning (RL) the current state of the environment may not always be available. One approach to fix this could be to include the actions after the last-known state as a part of the state information, however, that leads to an increased state-space making the problem complex and slower in convergence. We propose an approach, where the delay in the knowledge of the state can be used, and the decisions are made to maximize the expected state-action value function. The proposed algorithm is an alternate approach where the state space is not enlarged, as compared to the case when there is no delay in the state update. Evaluations on the basic RL environments further illustrate the improved performance of the proposed algorithm.
DOI:
10.1609/icaps.v31i1.15940
ICAPS
Proceedings of the International Conference on Automated Planning and Scheduling, 31