Robustness Can Be Cheap: A Highly Efficient Approach to Discover Outliers under High Outlier Ratios

  • Siqi Wang National University of Defense Technology
  • En Zhu National University of Defense Technology
  • Xiping Hu Chinese Academy of Science
  • Xinwang Liu National University of Defense Technology
  • Qiang Liu National University of Defense Technology
  • Jianping Yin National University of Defense Technology
  • Fei Wang Cornell University

Abstract

Efficient detection of outliers from massive data with a high outlier ratio is challenging but not explicitly discussed yet. In such a case, existing methods either suffer from poor robustness or require expensive computations. This paper proposes a Low-rank based Efficient Outlier Detection (LEOD) framework to achieve favorable robustness against high outlier ratios with much cheaper computations. Specifically, it is worth highlighting the following aspects of LEOD: (1) Our framework exploits the low-rank structure embedded in the similarity matrix and considers inliers/outliers equally based on this low-rank structure, which facilitates us to encourage satisfying robustness with low computational cost later; (2) A novel re-weighting algorithm is derived as a new general solution to the constrained eigenvalue problem, which is a major bottleneck for the optimization process. Instead of the high space and time complexity (O((2n)2)/O((2n)3)) required by the classic solution, our algorithm enjoys O(n) space complexity and a faster optimization speed in the experiments; (3) A new alternative formulation is proposed for further acceleration of the solution process, where a cheap closed-form solution can be obtained. Experiments show that LEOD achieves strong robustness under an outlier ratio from 20% to 60%, while it is at most 100 times more memory efficient and 1000 times faster than its previous counterpart that attains comparable performance. The codes of LEOD are publicly available at https://github.com/demonzyj56/LEOD.

Published
2019-07-17