Decomposing a complex task into a set of simpler subtasks is a common technique used by problem solving agents. In reinforcement learning decomposition can be used by having separate modules learn to achieve individual subgoals independently and in parallel. Since each module can ignore parts of the state space that are not relevant to its task, learning time is drastically reduced. However, since a modular decomposition implies the loss of a global perspective of the task, the agent’s performance depends on the strategy used to combine information provided by the different modules in order to select what action to execute. One class of such strategies use the local utilities provided by the individual modules to estimate the global utilities of actions. In previous work we have shown that an agent with pre-defined decomposition using such approximation strategies can achieve fast learning times with little loss in performance. However, since reinforcement learning is intended for domains where the human programmer has little knowledge about the structure of the task, it would be useful if the agent could discover a good task decomposition by itself. The only domain knowledge initially available to the agent is the reward function. The same knowledge used to encode the reward function can be used by the agent to define an initial decomposition of its task. This decomposition can then be refined, as the agent discovers the structure of the world and its task.