Abstract:
This paper presents a direct semantic analysis method for learning the correlation matrix between visual and textual words from socially tagged images. In the literature, to improve the traditional visual bag-of-words (BOW) representation, latent semantic analysis has been studied extensively for learning a compact visual representation, where each visual word may be related to multiple latent topics. However, these latent topics do not convey any true semantic information which can be understood by human. In fact, it remains a challenging problem how to recover the relationships between visual and textual words. Motivated by the recent advances in dealing with socially tagged images, we develop a direct semantic analysis method which can explicitly learn the correlation matrix between visual and textual words for social image classification. To this end, we formulate our direct semantic analysis from a graph-based learning viewpoint. Once the correlation matrix is learnt, we can readily first obtain a semantically refined visual BOW representation and then apply it to social image classification. Experimental results on two benchmark image datasets show the promising performance of the proposed method.
DOI:
10.1609/aaai.v28i1.8899