*Koji Fujimoto, Nobuo Inui, Yoshiyuki Kotani*

In recent work on morphological analysis based on statistical models, the conditional probability of the observed i-th word wi with the i-th tag ti after the (i-1)-th tag ti-1 is defined as the product of observation symbol probability and the state transition probability (i.e. P(wi | ti) times P(ti | ti-1) ). In order to improve accuracy, we face the following problems: 1) if we build hidden state levels using stricter categories (e.g. lowest POS class, over 3-gram, or word themselves), the state transition probability matrix becomes much bigger and more sparse; 2) if we use rough categories, the reliability of statistical information becomes lower in some parts of speech; and 3) the best state level is not the same among POS category, and some heuristic knowledge is necessary to select the best state structure.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.