We propose a novel method to detect events involving multiple agents in a video and to learn their structure in terms of temporally related chain of sub-events. The proposed method has three significant contributions over existing frameworks. First, in order to learn the event structure from training videos, we present the concept of a video event graph, which is composed of temporally related sub-events. Using the video event graph, we automatically encode the event dependency graph. The event dependency graph is the learnt event model that depicts the frequency of occurrence of conditionally dependent sub-events. Second, we pose the problem of event detection in novel videos as clustering the maximally correlated sub-events, and use normalized cuts to determine these clusters. The principal assumption made in this work is that the events are composed of highly correlated chain of sub-events, that have high weights (association) within the cluster and relatively low weights (disassociation) between clusters. These weights (between sub-events) are the likelihood estimates obtained from the event models. Last, we recognize the importance of representing the variations in the temporal order of sub-events, occurring in an event, and encode the probabilities directly into our representation. We show results of our learning, detection, and representation of events for videos in the meeting, surveillance, and railroad monitoring domains.