When adopting deep neural networks for a new vision task, a common practice is to start with fine-tuning some off-the-shelf well-trained network models from the community. Since a new task may require training a different network architecture with new domain data, taking advantage of off-the-shelf models is not trivial and generally requires considerable try-and-error and parameter tuning. In this paper, we denote a well-trained model as a teacher network and a model for the new task as a student network. We aim to ease the efforts of transferring knowledge from the teacher to the student network, robust to the gaps between their network architectures, domain data, and task definitions. Specifically, we propose a hybrid forward scheme in training the teacher-student models, alternately updating layer weights of the student model. The key merit of our hybrid forward scheme is on the dynamical balance between the knowledge transfer loss and task specific loss in training. We demonstrate the effectiveness of our method on a variety of tasks, e.g., model compression, segmentation, and detection, under a variety of knowledge transfer settings.