The kernel function plays a central role in kernel methods. Existing methods typically fix the functional form of the kernel in advance and then only adapt the associated kernel parameters based on empirical data. In this paper, we consider the problem of adapting the kernel so that it becomes more similar to the so-called ideal kernel. We formulate this as a distance metric learning problem that searches for a suitable linear transform (fcature weighting) in the kernel-induced feature space. This formulation is applicable even when the training set can only provide examples of similar and dissimilar pairs, but not explicit class label information. Computationally, this leads to a local-optima-free quadratic programming problem, with the number of variables independent of the number of features. Performance of this method is evaluated on classification and clustering tasks on both toy and real-world data sets.