High-level semantic knowledge in addition to low-level visual cues is essentially crucial for co-saliency detection. This paper proposes a novel end-to-end deep learning approach for robust co-saliency detection by simultaneously learning highlevel group-wise semantic representation as well as deep visual features of a given image group. The inter-image interaction at semantic-level as well as the complementarity between group semantics and visual features are exploited to boost the inferring of co-salient regions. Specifically, the proposed approach consists of a co-category learning branch and a co-saliency detection branch. While the former is proposed to learn group-wise semantic vector using co-category association of an image group as supervision, the latter is to infer precise co-salient maps based on the ensemble of group semantic knowledge and deep visual cues. The group semantic vector is broadcasted to each spatial location of multi-scale visual feature maps and is used as a top-down semantic guidance for boosting the bottom-up inferring of co-saliency. The co-category learning and co-saliency detection branches are jointly optimized in a multi-task learning manner, further improving the robustness of the approach. Moreover, we construct a new large-scale co-saliency dataset COCO-SEG to facilitate research of co-saliency detection. Extensive experimental results on COCO-SEG and a widely used benchmark Cosal2015 have demonstrated the superiority of the proposed approach as compared to the state-of-the-art methods.