Learning Logic Programs for Layout Analysis Correction

Margherita Berardi, Michelangelo Ceci, Floriana Esposito, and Donato Malerba

Layout analysis is the process of extracting a hierarchical structure describing the layout of a page. In the system WISDOM++, the layout analysis is performed in two steps: firstly, the global analysis determines possible areas containing paragraphs, sections, columns, figures and tables, and secondly, the local analysis groups together blocks that possibly fall within the same area. The result of the local analysis process strongly depends on the quality of the results of the first step. We investigate the possibility of supporting the user during the correction of the results of the global analysis. This is done by automatically generating training examples of action selections from the sequence of user actions, and then by learning action selection rules for layout correction. Rules are expressed as a logic program whose induction demands the careful application of ILP techniques. Experimental results on a set of multi-page documents shed evidence on the difficulty of the learning task tackled and pose new problems in learning control rules for adaptive interfaces.

