We address the problem of facial motion retargeting that aims to transfer facial motion from a 2D face image to 3D characters. Existing methods often formulate this problem as a 3D face reconstruction problem, which estimates the face attributes such as face identity and expression from face images. However, due to the lack of ground-truth labels for both identity and expression, most 3D-face reconstruction-based methods fail to capture the facial identity and expression accurately. As a result, these methods may not achieve promising performance. To address this, we propose an identity-consistent constraint to learn accurate identities by encouraging consistent identity prediction across multiple frames. Based on a more accurate identity, we are able to obtain a more accurate facial expression. Moreover, we further propose an expression-exclusive constraint to improve performance by avoiding the co-occurrence of contradictory expression units (e.g., ``brow lower'' vs. ``brow raise''). Extensive experiments on facial motion retargeting and 3D face reconstruction tasks demonstrate the superiority of the proposed method over existing methods. Our code and supplementary materials are available at https://github.com/deepmo24/CPEM.