I have investigated systems for on-line, cumulative learning of compositional hierarchies embedded within predictive probabilistic models. The hierarchies are learned unsupervised from unsegmented data streams. Such learning is critical for long-lived intelligent agents in complex worlds. Learned patterns enable prediction of unseen data and serve as building blocks for higher-level knowledge representation. These systems are examples of a rare combination---unsupervised, on-line structure learning (specifically structure growth). The system described here embeds a compositional hierarchy within an undirected graphical model based directly on Boltzmann machines, extended to handle categorical variables. A novel on-line chunking rule creates new nodes corresponding to frequently occurring patterns that are combinations of existing known patterns. This work can be viewed as a direct (and long overdue) attempt to explain how the hierarchical compositional structure of classic models such as McClelland and Rumelhart’s Interactive Activation model of context effects in letter perception can be learned automatically.