In this paper, I will describe my experiences in using the Phoenix simulator to investigate failure recovery [Howe and Cohen, 1991, Howe, 1992, Howe, 1993]. This paper is not intended to be a summary of the state of the art in evaluation or a description of how generally to evaluate AI Planners or to design and run experiments. (Readers interested in these topics are directed to [Cohen and Howe, 1988, Langley and Drummond, 1990, Cohen, 1991, Cohen, 1993].) Instead, the paper discusses the role of simulators in planning research and describes how the Phoenix simulator facilitated and obstructed my research. Additionally, the paper lists some of the pitfalls that I encountered and concludes with advice on how to avoid them.