While deep neural network-based video denoising methods have achieved promising results, it is still hard to deploy them on mobile devices due to their high computational cost and memory demands. This paper aims to develop a lightweight deep video denoising method that is friendly to resource-constrained mobile devices. Inspired by the facts that 1) consecutive video frames usually contain redundant temporal coherency, and 2) neural networks are usually over-parameterized, we propose a multi-input multi-output (MIMO) paradigm to process consecutive video frames within one-forward-pass. The basic idea is concretized to a novel architecture termed Recurrent Multi-output Network (ReMoNet), which consists of recurrent temporal fusion and temporal aggregation blocks and is further reinforced by similarity-based mutual distillation. We conduct extensive experiments on NVIDIA GPU and Qualcomm Snapdragon 888 mobile platform with Gaussian noise and simulated Image-Signal-Processor (ISP) noise. The experimental results show that ReMoNet is both effective and efficient on video denoising. Moreover, we show that ReMoNet is more robust under higher noise level scenarios.