We describe ongoing research towards building a cognitively plausible system for near one-shot learning of the meanings of attribute words and object names, by grounding them in a sensory model. The system learns incrementally from human demonstrations recorded with the Microsoft Kinect, in which the demonstrator can use unrestricted natural language descriptions. We achieve near-one shot learning of simple objects and attributes by focusing solely on examples where the learning agent is confident, ignoring the rest of the data. We evaluate the system's learning ability by having it generate descriptions of presented objects, including objects it has never seen before, and comparing the system response against collected human descriptions of the same objects. We propose that our method of retrieving object examples with a k-nearest neighbor classifier using Mahalanobis distance corresponds to a cognitively plausible representation of objects. Our initial results show promise for achieving rapid, near one-shot, incremental learning of word meanings.