Optimizing policies for real-time control of humanoid robots is a difficult task due to the continuous and stochastic nature of the state and action spaces. In this paper, we propose a learning procedure to train a predictive motion model and RFPI, a solver for continuous-state and action MDP. We use the predictive model as a transition model to train policies for a robot soccer. Our method requires no external hardware, a small amount of human work and manages to outperform the expert policy used by our team Rhoban winning the last 2016 edition of the Robocup in kid-size soccer league. Moreover, the proposed method is able to adapt to non-holonomic robots more efficiently than the expert approach. Our results are confirmed by both simulations and real robot experiments.