We present a roadmap to a Human Activity Language (HAL) for symbolic manipulation of visual and motor information in a sensory-motor system model. The visual perception subsystem translates a visual representation of action into our visuo-motor language. One instance of this perception process could be achieved by a Motion Capture system. We captured almost 90 different human actions in order to have empirical data that could validate and support our embodied language for movement and activity. The embodiment of the language serves as the interface between visual perception and the motor subsystem. The visuo- motor language is defined using a linguistic approach. In phonology, we define basic atomic segments that are used to compose human activity. Phonological rules are modeled as a finite automaton. In morphology, we study how visuo- motor phonemes are combined to form strings representing human activity and to generate a higher-level morphological grammar. This compact grammar suggests the existence of lexical units working as visuo-motor subprograms. In syntax, we present a model for visuo-motor sentence construction where the subject corresponds to the active joints (noun) modified by a posture (adjective). A verbal phrase involves the representation of the human activity (verb) and timing coordination among different joints (adverb).