As an alternative to manually writing extraction rules, we created STALKER, which is a wrapper induction algorithm that learns high-accuracy extraction rules. The major novelty introduced by STALKER is the concept of hierarchical wrapper induction: the extraction of the relevant data is performed in a hierarchical manner based on the embedded catalog tree (ECT), which is a user-provided description of the information to be extracted.
Registration: ISBN 978-0-262-51106-3
Copyright: July 18-22, 1999, Orlando, Florida. Published by The AAAI Press, Menlo Park, California.