Автоматическое реферирование текстов обзор алгоритмов и подходов к оценке качества

Челышев Э.А.; Раскатова М.В.; Мишин А.А.; Щголев П.

Automatic text summarization: overview of algorithms and approaches to quality assessment

Chelyshev E.A., Raskatova M.V., Mishin A.A., Shchegolev P.

Incoming article date: 21.10.2023

The paper presents an overview of the task of automatic text summarization. The formulation of the problem of automatic text summarization is carried out. The classification of algorithms for automatic text summarization by the type of the resulting summary and by the approach to solving the problem is carried out. Some existing problems in the field of automatic text summarization and disadvantages of certain classes of algorithms are described. The concepts of quality and information completeness of the summary are defined. The most popular approaches to the assessment of the information completeness of the summary and their classification in accordance with the methodology used are considered. The metrics of the ROUGE family are considered in relation to the task of automatic text summarization. Special attention is paid to the evaluation of the information completeness of the summary using such metrics of information proximity as the Kulback-Leibler divergence, the Jensen-Shannon divergence and the cosine distance (similarity). The metrics mentioned above can be applied to the text vector representations of the initial text and summary. The text vector representation in question can be performed using such methods like frequency vectorization, TF-IDF, static vectorizers and so on.

Keywords: automatic summarization, summary, information completeness, ROUGE, vectorization, TF-IDF, static vectorizer, Kullback-Leibler divergence, Jensen-Shannon divergence, cosine distance