Combining Syntactic Knowledge and Visual Text Recognition: A Hidden Markov Model for Part of Speech Tagging in a Work Recognition Algorithm

Jonathan Hull

The use of a hidden Markov model (HMM) for the assignment of part-of-speech (POS) tags to improve the performance of a text recognition algorithm is discussed. Syntactic constraints are described by the transition probabilities between POS tags. The confusion between the feature string for a word and the various tags is also described probabilislically. A modification of the Viterbi algorithm is also presented that finds a fixed number of sequences of tags for a given sentence that have the highest probabilities of occurrence, given the feature strings for the words. An experimental application of this approach is demonstrated with a word hypothesizalion algorithm that produces a number of guesses about the identity of each word in a running text. The use of first and second order transition probabililies is explored. Overall performance of between 65 and 80 percent reduction in the average number of words that can match a given image is achieved.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.