Human error is one of the most common causes of vul- nerability in a secure system. However it is often overlooked when these systems are tested, partly because human tests are costly and very hard to repeat. We have developed a community of agents that test secure systems by running standard windows software while performing collaborative group tasks, mimicking more realistic patterns of communication and traffic, as well as human fatigue and errors. This system is being deployed on a large cyber testing range. One key attribute of humans is flexibility of response in order to achieve their goals when unexpected events occur. Our agents use reactive planning within a BDI architecture to flexibly replan if needed. Since the agents are goal-oriented, we are able to measure the impact of cyber attacks on mission accomplishment, a more salient measure of protection than raw penetration. We show experimentally how the agent teams can be resilient under attacks that are partly successful, and also how an organizational structure can lead to emergent properties of the traffic in the network.