Natalia V. Loukachevitch
The described approach to text categorization is based on thematic representation of a text. Thematic representation includes nodes of thematically related terms simulating topics of the text and is provided with classes of their importance for the text. Thematic representation is created on the basis of detailed description of the domain and allows to process different types of texts, to use different systems of categories (in various languages) for text categorization, to adapt quickly the system to other formats and types of texts and/or other systems of categories, to categorize texts using several systems of categories simultaneously. The most part of the algorithm is not language-dependent.