Multi-view time series classification (MVTSC) aims to improve the performance by fusing the distinctive temporal information from multiple views. Existing methods for MVTSC mainly aim to fuse multi-view information at an early stage, e.g., by extracting a common feature subspace among multiple views. However, these approaches may not fully explore the unique temporal patterns of each view in complicated time series. Additionally, the label correlations of multiple views, which are critical to boosting, are usually under-explored for the MVTSC problem. To address the aforementioned issues, we propose a Correlative Channel-Aware Fusion (C$^2$AF) network. First, C$^2$AF extracts comprehensive and robust temporal patterns by a two-stream structured encoder for each view, and derives the intra-view/inter-view label correlations with a concise correlation matrix. Second, a channel-aware learnable fusion mechanism is implemented through CNN to further explore the global correlative patterns. Our C$^2$AF is an end-to-end framework for MVTSC. Extensive experimental results on three real-world datasets demonstrate the superiority of our C$^2$AF over the state-of-the-art methods. A detailed ablation study is also provided to illustrate the indispensability of each model component.