Statistical and Probabilistic Models
Many models of reality are probabilistic. For example, not everyone orders crisps with the/r beer, but a certain percentage does. Inferring such probabilistic knowledge from databases is one of the major challenges for data mining. Recently Agrawal investigated a class of such problems. In this paper a new class of such problems is investigated, viz., inferring risk-profiles. The prototypical example of this class is: "what is the probability that a given policy-holder will file a da/m w/th the insurance company in the next year". A risk-profile is then a description of a group ofinsurants that have the same probability for filing a claim. It is shown in this paper that homogeneous descriptions are the most plausible risk-profiles. Moreover, under modest assumpt/ons it is shown that covers of such homogeneous descriptions are essentially unique. A direct consequence of this result is that it suff/ces to search for the homogeneous description with the highest associated probability. The main result of this paper is thus that we show that the inference problem for risk-profiles reduces to the well studied problem of maximising a quality function.