Abstract:
We study the problem of automatically generating an integrated schema for different XML DTDs with similar document types. We describe an algorithm for approximate typing of XML DTDs and clustering them, a method for inferring general rules to describe source DTDs in the same class, and an algorithm for optimizing the learned rules. Introducing a novel view inference approach, we shows that the set of views and source descriptions can be automatically derived.