The recent advanced face recognition systems werebuilt on large Deep Neural Networks (DNNs) or theirensembles, which have millions of parameters. However, the expensive computation of DNNs make theirdeployment difficult on mobile and embedded devices. This work addresses model compression for face recognition,where the learned knowledge of a large teachernetwork or its ensemble is utilized as supervisionto train a compact student network. Unlike previousworks that represent the knowledge by the soften labelprobabilities, which are difficult to fit, we represent theknowledge by using the neurons at the higher hiddenlayer, which preserve as much information as the label probabilities, but are more compact. By leveragingthe essential characteristics (domain knowledge) of thelearned face representation, a neuron selection methodis proposed to choose neurons that are most relevant toface recognition. Using the selected neurons as supervisionto mimic the single networks of DeepID2+ andDeepID3, which are the state-of-the-art face recognition systems, a compact student with simple network structure achieves better verification accuracy on LFW than its teachers, respectively. When using an ensemble of DeepID2+ as teacher, a mimicked student is able to outperform it and achieves 51.6 times compression ratio and 90 times speed-up in inference, making this cumbersome model applicable on portable devices.