SegGen: a Genetic Algorithm for Linear Text Segmentation

Sylvain Lamprier, Tassadit Amghar, Bernard Levrat, Frederic Saubion

This paper describes SegGen, a new algorithm for linear text segmentation on general corpuses. It aims to segment texts into thematic homogeneous parts. Several existing methods have been used for this purpose, based on a sequential creation of boundaries. Here, we propose to consider boundaries simultaneously thanks to a genetic algorithm. SegGen uses two criteria: maximization of the internal cohesion of the formed segments and minimization of the similarity of the adjacent segments. First experimental results are promising and SegGen appears to be very competitive compared with existing methods.

Subjects: 13. Natural Language Processing; 1.9 Genetic Algorithms

Submitted: Oct 14, 2006

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.