Using Adaptive Probing for Real-Time Problem Diagnosis in Distributed Computer Systems

I. Rish, M. Brodie, S. Ma, G. Grabarnik, and N.Odintsova

In this work, we focus on cost-efficient techniques for real-time diagnosis in distributed systems that allow an adaptive, on-line selection and execution of appropriate measurements (tests). Particularly, one of our applications concerns fault diagnosis in distributed computer systems and networks by using test transactions, or {\em probes} (e.g., ''traceroute'' or ''ping'' commands). The key efficiency issues include both the cost of probing (e.g., the number of probes), and the computational complexity of diagnosis. In our past work we derived some theoretical conditions on the number of probes required for an asymptotic error-free diagnosis, and developed efficient search techniques for probe set selection that can greatly reduce the probe set size while maintaining its diagnostic capability. Next, we considered the problem of real-time diagnosis as a probabilistic inference in Bayesian networks and investigated simple and efficient local approximation techniques, based on variable-elimination (the mini-bucket scheme). Our empirical studies show that these approximations ``degrade gracefully" with noise and often yield an optimal solution when noise is low enough, and our initial theoretical analysis explains this behavior for the simplest (greedy) approximation. Our future work will focus on adapting more sophisticated approximation techniques, such as Generalized Belief Propagation, to real-time scenarios, and real-time, incremental learning of Dynamic Bayesian Networks based on the historic data and the feedback of the diagnosis results.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.