Recent research has shown that surprisingly rich models of human behavior can be learned from GPS (positional) data. However, most research to date has concentrated on modeling single individuals or aggregate statistical properties of groups of people. Given noisy real-world GPS data, we---in contrast---consider the problem of modeling and recognizing activities that involve multiple related individuals playing a variety of roles. Our test domain is the game of capture the flag---an outdoor game that involves many distinct cooperative and competitive joint activities. We model the domain using Markov logic, a statistical relational language, and learn a theory that jointly denoises the data and infers occurrences of high-level activities, such as capturing a player. Our model combines constraints imposed by the geometry of the game area, the motion model of the players, and by the rules and dynamics of the game in a probabilistically and logically sound fashion. We show that while it may be impossible to directly detect a multi-agent activity due to sensor noise or malfunction, the occurrence of the activity can still be inferred by considering both its impact on the future behaviors of the people involved as well as the events that could have preceded it. We compare our unified approach with three alternatives (both probabilistic and nonprobabilistic) where either the denoising of the GPS data and the detection of the high-level activities are strictly separated, or the states of the players are not considered, or both. We show that the unified approach with the time window spanning the entire game, although more computationally costly, is significantly more accurate.