An intelligent agent experiences the world through low-level sensory and motor interfaces (the "pixel level"). However, in order to function intelligently, it must be able to describe its world in terms of higher-level concepts such as places, paths, objects, actions, goals, plans, and so on. How can these higher-level concepts that make up the foundation of commonsense knowledge be learned from unguided experience at the pixel level? This question is important in practical terms: As robots are developed with increasingly complex sensory and motor systems, it becomes impractical for human engineers to implement their high-level concepts and define how those concepts are grounded in sensorimotor interaction. The same question is also important in theory: Does AI necessarily depend on human programming, or can the concepts at the foundation of intelligence be learned from unguided experience?