Data augmentation with Mixup (Zhang et al. 2018) has shown to be an effective model regularizer for current art deep classification networks. It generates out-of-manifold samples through linearly interpolating inputs and their corresponding labels of random sample pairs. Despite its great successes, Mixup requires convex combination of the inputs as well as the modeling targets of a sample pair, thus significantly limits the space of its synthetic samples and consequently its regularization effect. To cope with this limitation, we propose “nonlinear Mixup”. Unlike Mixup where the input and label pairs share the same, linear, scalar mixing policy, our approach embraces nonlinear interpolation policy for both the input and label pairs, where the mixing policy for the labels is adaptively learned based on the mixed input. Experiments on benchmark sentence classification datasets indicate that our approach significantly improves upon Mixup. Our empirical studies also show that the out-of-manifold samples generated by our strategy encourage training samples in each class to form a tight representation cluster that is far from others.