Proceedings:
Representation and Acquisition of Lexical Knowledge: Polysemy, Ambiguity, and Generativity
Volume
Issue:
Representation and Acquisition of Lexical Knowledge: Polysemy, Ambiguity, and Generativity
Track:
Contents
Downloads:
Abstract:
This paper presents a new method for learning case frames of Japanese polysemous verbs from a roughly parsed corpus when given a semantic hierarchy for nouns (thesaurus). Japanese verbs usually have several meanings which take different case frames. Each contains different types and numbers of case particles (case marker) which turn select different noun categories. The proposed method employs a bottom-up covering technique to avoid combinatorial explosion of more than ten case particles in Japanese and more than 3000 semantic categories in our thesaurus. First, a sequence of case frame candidates is produced by generalizing training instances using the thesaurus. Then to select the most plausible frame, we introduce a new compression-based utility criteria which can uniformly compare candidates consisting of different structures. Finally, we remove the instances covered by the frame and iterate the procedure until the utility measure becomes less than a predefined threshold. This produces a set of case frames each corresponding to a single verb meaning. The proposed method is experimentally evaluated by typical polysemous verbs taken from one-year newspaper articles.
Spring
Representation and Acquisition of Lexical Knowledge: Polysemy, Ambiguity, and Generativity