We describe a method for automatically clustering words according to their distribution in particular syntactic contexts. Words are represented by the relative frequency distributions of contexts in which they appear, and relative entropy is used to measure the dissimilarity of those distributions. Clusters are represented by "typical" context distributions averaged from the given words according to their probabilities of cluster membership, and in many cases can be thought of as encoding coarse sense distinctions. Deterministic annealing is used to find lowest distortion sets of clusters. As the annealing parameter increases, existing clusters become unstable and subdivide, yielding a hierarchical "soft" clustering of the data.