Semantic place labeling has been actively studied in the past few years due to its importance in understanding human mobility and lifestyle patterns. In the last decade, the rapid growth of geotagged multimedia data from online social networks provides a valuable opportunity to predict people's POI locations from temporal, spatial and visual cues. Among the massive amount of social media data, one important type of data is the geotagged web images from image-sharing websites. In this paper, we develop a reliable photo classifier based on the Convolutional Neutral Networks to classify the photo-taking scene of real-life photos. We then present a novel approach to home location and vacation locations prediction by fusing together the visual content of photos and the spatiotemporal features of people's mobility patterns. Using a well-trained classifier, we showed that the robust fusion of visual and spatiotemporal features achieves significant accuracy improvement over each of the features alone for both home and vacation detection.