Evan W. Steeg, Derek A. Robinson, and Ed Willis
We present a novel, fast method for association-mining in high-dimensional datasets. Our Coincidence Detection method, which combines random sampling and Chernoff-Hoeffding bounds with a novel coding/binning scheme, avoids the exhaustive search, prior limits on the order k of discovered associations, and exponentially large parameter space of other methods. Tight theoretical bounds on the complexity of randomized algorithms are impossible without strong input distribution assumptions. However, we observe sublinear time, space and data complexity in tests on constructed artificial datasets and in real application to important problems in bioinformatics and drug discovery. After placing the method in historical and mathematical context, we describe the method, and present theoretical and empirical results on its complexity and error.