AAAI Publications, Thirty-First AAAI Conference on Artificial Intelligence

Font Size: 
Unbiased Multivariate Correlation Analysis
Yisen Wang, Simone Romano, Vinh Nguyen, James Bailey, Xingjun Ma, Shu-Tao Xia

Last modified: 2017-02-13

Abstract


Correlation measures are a key element of statistics and machine learning, and essential for a wide range of data analysis tasks. Most existing correlation measures are for pairwise relationships, but real-world data can also exhibit complex multivariate correlations, involving three or more variables. We argue that multivariate correlation measures should be comparable, interpretable, scalable and unbiased. However, no existing measures satisfy all these requirements. In this paper, we propose an unbiased multivariate correlation measure, called UMC, which satisfies all the above criteria. UMC is a cumulative entropy based non-parametric multivariate correlation measure, which can capture both linear and non-linear correlations for groups of three or more variables. It employs a correction for chance using a statistical model of independence to address the issue of bias. UMC has high interpretability and we empirically show it outperforms state-of-the-art multivariate correlation measures in terms of statistical power, as well as for use in both subspace clustering and outlier detection tasks.

Keywords


multivariate correlation measure; bias analysis; statistical model of independence; subspace clustering; outlier detection

Full Text: PDF