Sally McClean, Bryan Scotney and Mary Shapcott
We consider the problem of aggregation for uncertain and imprecise data. For such data, we define aggregation operators and use them to provide information on properties and patterns of data attributes. The aggregates that we define use the Kullback-Leibler information divergence between the aggregated probability distribution and the individual tuple data values. We are thus able to provide a probability distribution for the domain values of an attribute or group of attributes using imperfect data. Information stored in a database is often subject to uncertainty and imprecision. An extended relational data model has previously been proposed for such data which allows us to quantify our uncertainty and imprecision about attribute values by representing them as a probability distribution. Our aggregation operators are defined on such a data model. The provision of such operators is a central requirement in furnishing a database with the capability to perform the operations necessary for Knowledge Discovery in Databases.