AAAI Publications, Thirty-First AAAI Conference on Artificial Intelligence

Font Size: 
Binary Embedding with Additive Homogeneous Kernels
Saehoon Kim, Seungjin Choi

Last modified: 2017-02-13

Abstract


Binary embedding transforms vectors in Euclidean space into the vertices of Hamming space such that Hamming distance between binary codes reflects a particular distance metric. In machine learning, the similarity metrics induced by Mercer kernels are frequently used, leading to the development of binary embedding with Mercer kernels (BE-MK) where the approximate nearest neighbor search is performed in a reproducing kernel Hilbert space (RKHS). Kernelized locality-sensitive hashing (KLSH), which is one of the representative BE-MK, uses kernel PCA to embed data points into a Euclidean space, followed by the random hyperplane binary embedding. In general, it works well when the query and data points in the database follow the same probability distribution. The streaming data environment, however, continuously requires KLSH to update the leading eigenvectors of the Gram matrix, which can be costly or hard to carry out in practice. In this paper we present a completely randomized binary embedding to work with a family of additive homogeneous kernels, referred to as BE-AHK. The proposed algorithm is easy to implement, built on Vedaldi and Zisserman's work on explicit feature maps for additive homogeneous kernels. We show that our BE-AHK is able to preserve kernel values by developing an upper- and lower-bound on its Hamming distance, which guarantees to solve approximate nearest neighbor search efficiently. Numerical experiments demonstrate that BE-AHK actually yields similarity-preserving binary codes in terms of additive homogeneous kernels and is superior to existing methods in case that training data and queries are generated from different distributions. Moreover, in cases where a large code size is allowed, the performance of BE-AHK is comparable to that of KLSH in general cases.

Keywords


Binary embedding; Locality-sensitive hashing; Randomized algorithm

Full Text: PDF