Hierarchical Voting Experts: An Unsupervised Algorithm for Segmenting Hierarchically Structured Sequences

Matthew Miller, Alexander Stoytchev

This paper extends the Voting Experts (VE) algorithm to segment hierarchically structured sequences. The original algorithm was tested on text segmentation, and made use of two proposed characteristics of chunks, namely low internal entropy and high boundary entropy of segments. VE looks for these two properties, and uses them to segment sequences of tokens. It is surprisingly powerful given its simplicity, suggesting that the principle of segmenting based on low internal entropy and high boundary entropy is promising. Real world data often exhibits an inherently hierarchical structure, and it is well known that humans tend to chunk the world hierarchically. It is therefore interesting to explore the applicability of a modified version of VE on hierarchically structured data. We show that VE can be generalized to work on hierarchical data, and also that the higher order models can be used to improve the accuracy of the segmentation at lower levels.

Subjects: 13. Natural Language Processing; 12. Machine Learning and Discovery

Submitted: Apr 7, 2008

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.