Kristina Lerman and Steven Minton, University of Southern California
The proliferation of online information sources has accentuated the need for tools that automatically validate and recognize data. We present an efficient algorithm that learns structural information about data from positive examples alone. We describe two Web wrapper maintenance applications that employ this algorithm. The first application detects when a wrapper is not extracting correct data. The second application automatically identifies data on Web pages so that the wrapper may be re-induced when the source format changes.