AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning

Authors

Enmin Zhao

Institute of Automation, Chinese Academy of Sciences School of artificial intelligence, University of Chinese Academy of Sciences

Renye Yan

School of artificial intelligence, University of Chinese Academy of Science Institute of Automation,Chinese Academy of Sciences

Jinqiu Li

Institute of Automation, Chinese Academy of Sciences School of artificial intelligence, University of Chinese Academy of Sciences

Kai Li

Institute of Automation, Chinese Academy of Sciences School of artificial intelligence, University of Chinese Academy of Sciences

Junliang Xing

Institute of Automation, Chinese Academy of Sciences Tsinghua University School of Artificial Intelligence, University of Chinese Academy of Sciences

Proceedings:

No. 4: AAAI-22 Technical Tracks 4

Volume

Issue:

Proceedings of the AAAI Conference on Artificial Intelligence, 36

Track:

AAAI Technical Track on Domain(s) Of Application

Downloads:

Download PDF

Abstract:

Heads-up no-limit Texas hold’em (HUNL) is the quintessential game with imperfect information. Representative priorworks like DeepStack and Libratus heavily rely on counter-factual regret minimization (CFR) and its variants to tackleHUNL. However, the prohibitive computation cost of CFRiteration makes it difficult for subsequent researchers to learnthe CFR model in HUNL and apply it in other practical applications. In this work, we present AlphaHoldem, a high-performance and lightweight HUNL AI obtained with an end-to-end self-play reinforcement learning framework. The proposed framework adopts a pseudo-siamese architecture to directly learn from the input state information to the output actions by competing the learned model with its different historical versions. The main technical contributions include anovel state representation of card and betting information, amultitask self-play training loss function, and a new modelevaluation and selection metric to generate the final model.In a study involving 100,000 hands of poker, AlphaHoldemdefeats Slumbot and DeepStack using only one PC with threedays training. At the same time, AlphaHoldem only takes 2.9milliseconds for each decision-making using only a singleGPU, more than 1,000 times faster than DeepStack. We release the history data among among AlphaHoldem, Slumbot,and top human professionals in the author’s GitHub repository to facilitate further studies in this direction.

DOI:

10.1609/aaai.v36i4.20394

AAAI

Proceedings of the AAAI Conference on Artificial Intelligence, 36

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.