Modeling discourse coherence is an important problem in natural language generation and understanding. Sentence ordering, the goal of which is to organize a set of sentences into a coherent text, is a commonly used task to learn and evaluate the model. In this paper, we propose a novel hierarchical attention network that captures word clues and dependencies between sentences to address this problem. Our model outperforms prior methods and achieves state-of-the-art performance on several datasets in different domains. Furthermore, our experiments demonstrate that the model performs very well even though adding noisy sentences into the set, which shows the robustness and effectiveness of the model. Visualization analysis and case study show that our model captures the structure and pattern of coherent texts not only by simple word clues but also by consecution in context.