Learning CPG Sensory Feedback with Policy Gradient for Biped Locomotion for a Full-Body Humanoid

Gen Endo, Jun Morimoto, Takamitsu Matsubara, Jun Nakanishi, Gordon Cheng

This paper describes a learning framework for a central pattern generator based biped locomotion controller using a policy gradient method. Our goals in this study are to achieve biped walking with a 3D hardware humanoid, and to develop an efficient learning algorithm with CPG by reducing the dimensionality of the state space used for learning. We demonstrate that an appropriate feedback controller can be acquired within a thousand trials by numerical simulations and the obtained controller in numerical simulation achieves stable walking with a physical robot in the real world. Numerical simulations and hardware experiments evaluated walking velocity and stability. Furthermore, we present the possibility of an additional online learning using a hardware robot to improve the controller within 200 iterations.

Content Area: 17.Robotics

Subjects: 17. Robotics; 15.3 Control

Submitted: May 10, 2005


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.