AAAI Publications, Twenty-Fourth International FLAIRS Conference

Font Size: 
Co-Occurrence-Based Error Correction Approach to Word Segmentation
Ekawat Chaowicharat, Kanlaya Naruedomkul

Last modified: 2011-03-20

Abstract


To overcome the problems in Thai word segmentation, a number of word segmentation has been proposed during the long period of time until today. We propose a novel Thai word segmentation approach so called Co-occurrence-Based Error Correction (CBEC). CBEC generates all possible segmentation candidates using the classical maximal matching algorithm and then selects the most accurate segmentation based on co-occurrence and an error correction algorithm. CBEC was trained and evaluated on BEST 2009 corpus.

Full Text: PDF