Learning General Optimal Policies with Graph Neural Networks: Expressive Power, Transparency, and Limits

Authors

Simon Ståhlberg,Blai Bonet,Hector Geffner

Linköping University, Sweden,Universitat Pompeu Fabra, Spain,Universitat Pompeu Fabra, Spain Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain Linköping University, Sweden

Proceedings:

Book One

Volume

Issue:

Proceedings of the International Conference on Automated Planning and Scheduling, 32

Track:

Planning and Learning Track

Downloads:

Download PDF

Abstract:

It has been recently shown that general policies for many classical planning domains can be expressed and learned in terms of a pool of features defined from the domain predicates using a description logic grammar. At the same time, most description logics correspond to a fragment of k-variable counting logic (C_k) for k=2, that has been shown to provide a tight characterization of the expressive power of graph neural networks. In this work, we make use of these results to understand the power and limits of using graph neural networks (GNNs) for learning optimal general policies over a number of tractable planning domains where such policies are known to exist. For this, we train a simple GNN in a supervised manner to approximate the optimal value function V*(s) of a number of sample states s. As predicted by the theory, it is observed that general optimal policies are obtained in domains where general optimal value functions can be defined with C_2 features but not in those requiring more expressive C_3 features. In addition, it is observed that the features learned are in close correspondence with the features needed to express V* in closed form. The theory and the analysis of the domains let us understand the features that are actually learned as well as those that cannot be learned in this way, and let us move in a principled manner from a combinatorial optimization approach to learning general policies to a potentially, more robust and scalable approach based on deep learning.

DOI:

10.1609/icaps.v32i1.19851

ICAPS

Proceedings of the International Conference on Automated Planning and Scheduling, 32

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.