Cross-entropy loss combined with softmax is one of the most commonly used supervision components in most existing segmentation methods. The softmax loss is typically good at optimizing the inter-class difference, but not good at reducing the intra-class variation, which can be suboptimal for semantic segmentation task. In this paper, we propose a Consistent-Separable Feature Representation Network to model the Consistent-Separable (C-S) features, which are intra-class consistent and inter-class separable, improving the discriminative power of the deep features. Specifically, we develop a Consistent-Separable Feature Learning Module to obtain C-S features through a new loss, called Class-Aware Consistency loss. This loss function is proposed to force the deep features to be consistent among the same class and apart between different classes. Moreover, we design an Adaptive feature Aggregation Module to fuse the C-S features and original features from backbone for the better semantic prediction. We show that compared with various baselines, the proposed method brings consistent performance improvement. Our proposed approach achieves state-of-the-art performance on Cityscapes (82.6% mIoU in test set), ADE20K (46.65% mIoU in validation set), COCO Stuff (41.3% mIoU in validation set) and PASCAL Context (55.9% mIoU in test set).