Recently, binary neural network (BNN) based super-resolution (SR) methods have enjoyed initial success in the SR field. However, there is a noticeable performance gap between the binarized model and the full-precision one. Furthermore, the batch normalization (BN) in binary SR networks introduces floating-point calculations, which is unfriendly to low-precision hardwares. Therefore, there is still room for improvement in terms of model performance and efficiency. Focusing on this issue, in this paper, we first explore a novel binary training mechanism based on the feature distribution, allowing us to replace all BN layers with a simple training method. Then, we construct a strong baseline by combining the highlights of recent binarization methods, which already surpasses the state-of-the-arts. Next, to train highly accurate binarized SR model, we also develop a lightweight network architecture and a multi-stage knowledge distillation strategy to enhance the model representation ability. Extensive experiments demonstrate that the proposed method not only presents advantages of lower computation as compared to conventional floating-point networks but outperforms the state-of-the-art binary methods on the standard SR networks.