A way of identification
of text electronic documents classification attributes
Alexei Kebkalo, Anton Mikhailyuk
Specialized computer systems department,
National Technical University of Ukraine “KPI”, 37, Peremohy ave,
The
problem of text documents automated classification is one of actual problems
for today. There is an approach to classification of documents at which it is
necessary to consider their structure and to identify their parts containing classification
attributes. On the basis of these attributes classification is made. Using such
approach to classification is necessary for information systems working with
documents of certain types. Procedure of classification in this case consists
of two steps: identification of the document’s parts containing attributes of
classification, and directly classification.
In
clause the approach for realization of a first step - identification of parts
of the document for the subsequent classification is offered. The approach is
based on formation of sets of markers for parts of the document and on their
search in the examined document.
The
offered approach can be used not only for classification, but also for
summarization or automatic analysis of the text, for example, for automatic identification
of attributes when adding new document to the document oriented system.