The rapid development of facial manipulation techniques has aroused public concerns in recent years. Existing deepfake video detection approaches attempt to capture the discrim- inative features between real and fake faces based on tem- poral modelling. However, these works impose supervisions on sparsely sampled video frames but overlook the local mo- tions among adjacent frames, which instead encode rich in- consistency information that can serve as an efficient indica- tor for DeepFake video detection. To mitigate this issue, we delves into the local motion and propose a novel sampling unit named snippet which contains a few successive videos frames for local temporal inconsistency learning. Moreover, we elaborately design an Intra-Snippet Inconsistency Module (Intra-SIM) and an Inter-Snippet Interaction Module (Inter- SIM) to establish a dynamic inconsistency modelling frame- work. Specifically, the Intra-SIM applies bi-directional tem- poral difference operations and a learnable convolution ker- nel to mine the short-term motions within each snippet. The Inter-SIM is then devised to promote the cross-snippet infor- mation interaction to form global representations. The Intra- SIM and Inter-SIM work in an alternate manner and can be plugged into existing 2D CNNs. Our method outperforms the state of the art competitors on four popular benchmark dataset, i.e., FaceForensics++, Celeb-DF, DFDC and Wild- Deepfake. Besides, extensive experiments and visualizations are also presented to further illustrate its effectiveness.