Abnormal event detection in the surveillance video is an essential but challenging task, and many methods have been proposed to deal with this problem. The previous methods either only consider the appearance information or directly integrate the results of appearance and motion information without considering their endogenous consistency semantics explicitly. Inspired by the rule humans identify the abnormal frames from multi-modality signals, we propose an Appearance-Motion Memory Consistency Network (AMMC-Net). Our method first makes full use of the prior knowledge of appearance and motion signals to explicitly capture the correspondence between them in the high-level feature space. Then, it combines the multi-view features to obtain a more essential and robust feature representation of regular events, which can significantly increase the gap between an abnormal and a regular event. In the anomaly detection phase, we further introduce a commit error in the latent space joint with the prediction error in pixel space to enhance the detection accuracy. Solid experimental results on various standard datasets validate the effectiveness of our approach.