The advent of social media has presented a promising new opportunity for the early detection of depression. To do so effectively, there are two challenges to overcome. The first is that textual and visual information must be jointly considered to make accurate inferences about depression. The second challenge is that due to the variety of content types posted by users, it is difficult to extract many of the relevant indicator texts and images. In this work, we propose the use of a novel cooperative multi-agent model to address these challenges. From the historical posts of users, the proposed method can automatically select related indicator texts and images. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods by a large margin (over 30% error reduction). In several experiments and examples, we also verify that the selected posts can successfully indicate user depression, and our model can obtained a robust performance in realistic scenarios.