Robust Adversarial Reinforcement Learning with Dissipation Inequation Constraint

Authors

Peng Zhai

Academy for Engineering and Technology, Fudan University, Shanhai, China Jilin Provincial Key Laboratory of Intelligence Science and Engineering, Changchun, China

Jie Luo

Academy for Engineering and Technology, Fudan University, Shanhai, China Engineering Research Center of AI and Robotics, Shanghai, China

Zhiyan Dong

Academy for Engineering and Technology, Fudan University, Shanhai, China Ji Hua Laboratory, Foshan, China Engineering Research Center of AI and Robotics, Shanghai, China

Lihua Zhang

Academy for Engineering and Technology, Fudan University, Shanhai, China Engineering Research Center of AI and Robotics, Ministry of Education, Shanghai 200433, China Jilin Provincial Key Laboratory of Intelligence Science and Engineering, Changchun, China

Shunli Wang

Academy for Engineering and Technology, Fudan University, Shanhai, China Engineering Research Center of AI and Robotics, Ministry of Education, Shanghai 200433, China

Dingkang Yang

Academy for Engineering and Technology, Fudan University, Shanhai, China Jilin Provincial Key Laboratory of Intelligence Science and Engineering, Changchun, China

Proceedings:

No. 5: AAAI-22 Technical Tracks 5

Volume

Issue:

Proceedings of the AAAI Conference on Artificial Intelligence, 36

Track:

AAAI Technical Track on Intelligent Robotics

Downloads:

Download PDF

Abstract:

Robust adversarial reinforcement learning is an effective method to train agents to manage uncertain disturbance and modeling errors in real environments. However, for systems that are sensitive to disturbances or those that are difficult to stabilize, it is easier to learn a powerful adversary than establish a stable control policy. An improper strong adversary can destabilize the system, introduce biases in the sampling process, make the learning process unstable, and even reduce the robustness of the policy. In this study, we consider the problem of ensuring system stability during training in the adversarial reinforcement learning architecture. The dissipative principle of robust H-inﬁnity control is extended to the Markov Decision Process, and robust stability constraints are obtained based on L2 gain performance in the reinforcement learning system. Thus, we propose a dissipation-inequation-constraint-based adversarial reinforcement learning architecture. This architecture ensures the stability of the system during training by imposing constraints on the normal and adversarial agents. Theoretically, this architecture can be applied to a large family of deep reinforcement learning algorithms. Results of experiments in MuJoCo and GymFc environments show that our architecture effectively improves the robustness of the controller against environmental changes and adapts to more powerful adversaries. Results of the flight experiments on a real quadcopter indicate that our method can directly deploy the policy trained in the simulation environment to the real environment, and our controller outperforms the PID controller based on hardware-in-the-loop. Both our theoretical and empirical results provide new and critical outlooks on the adversarial reinforcement learning architecture from a rigorous robust control perspective.

DOI:

10.1609/aaai.v36i5.20481

AAAI

Proceedings of the AAAI Conference on Artificial Intelligence, 36

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.