In recent years, there has been growing interest in the study of automated playlist generation — music recommender systems that focus on modeling preferences over song sequences rather than on individual songs in isolation. This paper addresses this problem by learning personalized models on the fly of both song and transition preferences, uniquely tailored to each user’s musical tastes. Playlist recommender systems typically include two main components: i) a preferencelearning component, and ii) a planning component for selecting the next song in the playlist sequence. While there has been much work on the former, very little work has been devoted to the latter. This paper bridges this gap by focusing on the planning aspect of playlist generation within the context of DJ-MC, our playlist recommendation application. This paper also introduces a new variant of playlist recommendation, which incorporates the notion of diversity and novelty directly into the reward model. We empirically demonstrate that the proposed planning approach significantly improves performance compared to the DJ-MC baseline in two playlist recommendation settings, increasing the usability of the framework in real world settings.