Autonomous robot systems operating in an uncertain environment have to be reactive and adaptive in order to cope with changing environment conditions and task requirements. To achieve this, the control architecture presented in this paper uses reinforcement learning on top of an abstract Discrete Event Dynamic System (DEDS) supervisor to learn to coordinate a set of continuous controllers in order to perform a given task. In addition to providing a base reactivity through the underlying stable and convergent control elements, the use of this hybrid control approach also allows the learning to be performed on an abstract system model which dramatically reduces the complexity of the learning problem. Furthermore, the DEDS formalism provides means of imposing safety constraints a priori, such that learning can be performed on-line in a single trial without the need for an outside teacher. To demonstrate the applicability of this approach, the architecture is used to learn a turning gait on a four-legged robot platform.