The global trajectories of targets on ground can be well captured from a top view in a high altitude, e.g., by a drone-mounted camera, while their local detailed appearances can be better recorded from horizontal views, e.g., by a helmet camera worn by a person. This paper studies a new problem of multiple human tracking from a pair of top- and horizontal-view videos taken at the same time. Our goal is to track the humans in both views and identify the same person across the two complementary views frame by frame, which is very challenging due to very large field of view difference. In this paper, we model the data similarity in each view using appearance and motion reasoning and across views using appearance and spatial reasoning. Combing them, we formulate the proposed multiple human tracking as a joint optimization problem, which can be solved by constrained integer programming. We collect a new dataset consisting of top- and horizontal-view video pairs for performance evaluation and the experimental results show the effectiveness of the proposed method.