GTA: Graph Truncated Attention for Retrosynthesis

Authors

Seung-Woo Seo

Samsung Advanced Institute of Technology

You Young Song

Samsung Advanced Institute of Technology

June Yong Yang

Korea Advanced Institute of Science and Technology

Seohui Bae

Korea Advanced Institute of Science and Technology

Hankook Lee

Korea Advanced Institute of Science and Technology

Jinwoo Shin

Korea Advanced Institute of Science and Technology

Sung Ju Hwang

Korea Advanced Institute of Science and Technology

Eunho Yang

Korea Advanced Institute of Science and Technology

Proceedings:

No. 1: AAAI-21 Technical Tracks 1

Volume

Issue:

Proceedings of the AAAI Conference on Artificial Intelligence, 35

Track:

AAAI Technical Track on Application Domains

Downloads:

Download PDF

Abstract:

Retrosynthesis is the task of predicting reactant molecules from a given product molecule and is, important in organic chemistry because the identification of a synthetic path is as demanding as the discovery of new chemical compounds. Recently, the retrosynthesis task has been solved automatically without human expertise using powerful deep learning models. Recent deep models are primarily based on seq2seq or graph neural networks depending on the function of molecular representation, sequence, or graph. Current state-of-the-art models represent a molecule as a graph, but they require joint training with auxiliary prediction tasks, such as the most probable reaction template or reaction center prediction. Furthermore, they require additional labels by experienced chemists, thereby incurring additional cost. Herein, we propose a novel template-free model, i.e., Graph Truncated Attention (GTA), which leverages both sequence and graph representations by inserting graphical information into a seq2seq model. The proposed GTA model masks the self-attention layer using the adjacency matrix of product molecule in the encoder and applies a new loss using atom mapping acquired from an automated algorithm to the cross-attention layer in the decoder. Our model achieves new state-of-the-art records, i.e., exact match top-1 and top-10 accuracies of 51.1% and 81.6% on the USPTO-50k benchmark dataset, respectively, and 46.0% and 70.0% on the USPTO-full dataset, respectively, both without any reaction class information. The GTA model surpasses prior graph-based template-free models by 2% and 7% in terms of the top-1 and top-10 accuracies on the USPTO-50k dataset, respectively, and by over 6% for both the top-1 and top-10 accuracies on the USPTO-full dataset.

DOI:

10.1609/aaai.v35i1.16131

AAAI

Proceedings of the AAAI Conference on Artificial Intelligence, 35

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.