We present an abstractive summarization system that produces summary for Chinese e-commerce products. This task is more challenging than general text summarization. First, the appearance of a product typically plays a significant role in customers' decisions to buy the product or not, which requires that the summarization model effectively use the visual information of the product. Furthermore, different products have remarkable features in various aspects, such as “energy efficiency” and “large capacity” for refrigerators. Meanwhile, different customers may care about different aspects. Thus, the summarizer needs to capture the most attractive aspects of a product that resonate with potential purchasers. We propose an aspect-aware multimodal summarization model that can effectively incorporate the visual information and also determine the most salient aspects of a product. We construct a large-scale Chinese e-commerce product summarization dataset that contains approximately 1.4 million manually created product summaries that are paired with detailed product information, including an image, a title, and other textual descriptions for each product. The experimental results on this dataset demonstrate that our models significantly outperform the comparative methods in terms of both the ROUGE score and manual evaluations.