The unsupervised detection of hierarchical structures is a major topic in unsupervised learning and one of the key questions in data analysis and representation. We propose a novel algorithm for the problem of learning decision trees for data clustering and related problems. In contrast to many other methods based on successive tree growing and pruning, we propose an objective function for tree evaluation and we derive a non-greedy technique for tree growing. Applying the principles of maximum entropy and minimum cross entropy, a deterministic annealing algorithm is derived in a meanfield approximation. This technique allows us to canonically superimpose tree structures and to fit parameters to averaged or 'fuzzified' trees.