In many machine learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate. Consequently, semi-supervised learning, learning from a combination of both labeled and unlabeled data, has become a topic of significant recent interest. Our research focus is on semi-supervised clustering, which uses a small amount of supervised data in the form of class labels or pairwise constraints on some examples to aid unsupervised clustering. Semi-supervised clustering can be either constraint-based, i.e., changes are made to the clustering objective to satisfy user-specified labels/constraints, or metricbased, i.e., the clustering distortion measure is trained to satisfy the given labels/constraints. Our main goal in this thesis is to study constraint-based semi-supervised clustering algorithms, integrate them with metric-based approaches, characterize some of their properties and empirically validate our algorithms on different domains, e.g., text processing and bioinformatics.