Omnidirectional images, also called 360◦images, have attracted extensive attention in recent years, due to the rapid development of virtual reality (VR) technologies. During omnidirectional image processing including capture, transmission, consumption, and so on, measuring the perceptual quality of omnidirectional images is highly desired, since it plays a great role in guaranteeing the immersive quality of experience (IQoE). In this paper, we conduct a comprehensive study on the perceptual quality of omnidirectional images from both subjective and objective perspectives. Specifically, we construct the largest so far subjective omnidirectional image quality database, where we consider several key influential elements, i.e., realistic non-uniform distortion, viewing condition, and viewing behavior, from the user view. In addition to subjective quality scores, we also record head and eye movement data. Besides, we make the first attempt by using the proposed database to train a convolutional neural network (CNN) for blind omnidirectional image quality assessment. To be consistent with the human viewing behavior in the VR device, we extract viewports from each omnidirectional image and incorporate the user viewing conditions naturally in the proposed model. The proposed model is composed of two parts, including a multi-scale CNN-based feature extraction module and a perceptual quality prediction module. The feature extraction module is used to incorporate the multi-scale features, and the perceptual quality prediction module is designed to regress them to perceived quality scores. The experimental results on our database verify that the proposed model achieves the competing performance compared with the state-of-the-art methods.