Question-answering systems are becoming increasingly popular in Natural Language Processing, especially when applied in smart factory settings. A common practice in designing those systems is through intent classification. However, in a multiple-stage task commonly seen in those settings, relying solely on intent classification may lead to erroneous answers, as questions rising from different work stages may share the same intent but have different contexts and therefore require different answers. To address this problem, we designed an interactive dialogue system that utilizes contextual information to assist intent classification in a multiple-stage task. Specifically, our system incorporates user’s utterances with real-time video feed to better situate users’ questions and analyze their intent.