Depth estimation from Light Field (LF) images is a crucial basis for LF related applications. Since multiple views with abundant information are available, how to effectively fuse features of these views is a key point for accurate LF depth estimation. In this paper, we propose a novel attention-based multi-level fusion network. Combined with the four-branch structure, we design intra-branch fusion strategy and inter-branch fusion strategy to hierarchically fuse effective features from different views. By introducing the attention mechanism, features of views with less occlusions and richer textures are selected inside and between these branches to provide more effective information for depth estimation. The depth maps are finally estimated after further aggregation. Experimental results shows the proposed method achieves state-of-the-art performance in both quantitative and qualitative evaluation, which also ranks first in the commonly used HCI 4D Light Field Benchmark.