As intelligent software agents are pushed out into the real world and increasingly take over responsibility for important tasks, it becomes important that there are ways of guaranteeing that the agents are safe. In this paper, a safe agent is one where people can trust their model of the agent to be reliable and where the future behavior of the agent is predictable. Specifically, if the agent exhibits learning then the user needs to be able to predict how that learning affects the agent’s future behavior. One consequence of this is that most people expect that learning improves the agent’s performance. In particular, people expect that the agent will still be able to solve all the problems after learning that it could solve before learning. This paper discusses the sandbox learning protocol which tries to address this issue and we discuss how it’s not applicable in environments where the agent needs to be able to apply its learning almost immediately after it’s learned. An incremental learning protocol is described which is appropriate for such environments. We then describe an experiment we did with blackbox, a planning system that has enjoyed success in recent planning competitions and which now has an integrated learning component. blackbox uses the foil learning algorithm to learn search control rules. Our experiment shows that when blackbox uses the incremental learning protocol, its performance (in terms of problems solved) is severly degraded.