How can we build artificial agents that can autonomously explore and understand their environments? An immediate requirement for such an agent is to learn how its own sensory state corresponds to the external world properties: It needs to learn the semantics of its internal state (i.e., grounding). In principle, we as programmers can provide the agents with the required semantics, but this will compromise the autonomy of the agent. To overcome this problem, we may fall back on natural agents and see how they acquire meaning of their own sensory states, their neural firing patterns. We can learn a lot about what certain neural spikes mean by carefully controlling the input stimulus while observing how the neurons fire. However, neurons embedded in the brain do not have direct access to the outside stimuli, so such a stimulus-to-spike association may not be learnable at all. How then can the brain solve this problem? (We know it does.) We propose that motor interaction with the environment is necessary to overcome this conundrum. Further, we provide a simple yet powerful criterion, sensory invariance, for learning the meaning of sensory states. The basic idea is that a particular form of action sequence that maintains invariance of a sensory state will express the key property of the environmental stimulus that gave rise to the sensory state. Our experiments with a sensorimotor agent trained on natural images show that sensory invariance can indeed serve as a powerful objective for semantic grounding.