Track:
Contents
Downloads:
Abstract:
The use of a hidden Markov model (HMM) for the assignment of part-of-speech (POS) tags to improve the performance of a text recognition algorithm is discussed. Syntactic constraints are described by the transition probabilities between POS tags. The confusion between the feature string for a word and the various tags is also described probabilislically. A modification of the Viterbi algorithm is also presented that finds a fixed number of sequences of tags for a given sentence that have the highest probabilities of occurrence, given the feature strings for the words. An experimental application of this approach is demonstrated with a word hypothesizalion algorithm that produces a number of guesses about the identity of each word in a running text. The use of first and second order transition probabililies is explored. Overall performance of between 65 and 80 percent reduction in the average number of words that can match a given image is achieved.