Exiting simple samples in adaptive multi-exit networks through early modules is an effective way to achieve high computational efficiency. One can observe that deployments of multi-exit architectures on resource-constrained devices are easily limited by high memory footprint of early modules. In this paper, we propose a novel approach named recurrent aggregation operator (ReX), which uses recurrent neural networks (RNNs) to effectively aggregate intra-patch features within a large receptive field to get delicate local representations, while bypassing large early activations. The resulting model, named ReXNet, can be easily extended to dynamic inference by introducing a novel consistency-based early exit criteria, which is based on the consistency of classification decisions over several modules, rather than the entropy of the prediction distribution. Extensive experiments on two benchmark datasets, i.e., Visual Wake Words, ImageNet-1k, demonstrate that our method consistently reduces the peak RAM and average latency of a wide variety of adaptive models on low-power devices.