The symbol grounding problem, described recently by Harnad, states that the symbols which a traditional AI system manipulates are meaningless to the system, the system thus being dependent on a human operator to interpret the results of its computations. The solution Harnad suggests is to ground the symbols in the system’s ability to identify and manipulate the objects the symbols stand for. To achieve this, he proposes a hybrid system with both symbolic and connectionist components. The first section of this article presents a framework for a more general solution in which a composite concept description provides the critical connection between the symbols and their real-world referents. The central part of this description, referred to here as the epist emological representation, is used by the vision system for identifying (categorizing) objects. Such a representation is often referred to in computer vision as the object model and in machine learning as the concept description. Arguments are then presen ted for why a representation of this sort should be learned rather than preprogrammed. The second section of the article attempts to make explicit the demands on the system that arise when the learning of an epistemological representation is based on perceiv ing objects in a real-world environment, as well as the consequences this has for the learning and representation of the epistemological component.