Current computing systems are just beginning to enable the computational manipulation of temporal media like video and audio. Because of the opacity of these media they must be represented in order to be manipulable according to their contents. Knowledge representation techniques have been implicitly designed for representing the physical world and its textual representations. Temporal media pose unique problems and opportunities for knowledge representation which challenge many of its assumptions about the structure and function of what is represented. The semantics and syntax of temporal media require representational designs which employ fundamentally different conceptions of space, time, identity, and action. In particular, the effect of the syntax of video sequences on the semantics of video shots demands a representational design which can clearly articulate the differences between the context-dependent and context-independent semantics of video data. This paper outlines the theoretical foundations for designing representations of video, discusses Media Streams, an implemented system for video representation and retrieval, and critiques related efforts in this area.