from 01.01.2023 to 01.01.2024
Tomsk, Tomsk, Russian Federation
News agencies compete in the digital space, where the success often depends on the promptness of publication, which can be provided by automatic headline generation technologies. This study examined the effect of dataset types on the quality of headline generation, i.e., the impact of dataset type (individual news categories vs. their combination) on the quality of automatic news headlines. The initial hypothesis was that training the RuGPT-3 model on thematic sets of articles and on their totality would give different generated headlines. The authors used the RuGPT-3 model and news articles published by Lenta.ru. The research included three datasets: the categories of science and sports (6,900 articles each) and their combination (6,900 articles). The results confirmed the hypothesis: the model trained on the combined dataset generated higher-quality headlines as measured by the formal ROUGE metric, achieving an average F-score of 0.22 (compared to 0.17 for science and 0.2 for sports). The generated headlines looked authentic and conformed to the good headline practice, i.e., length (≤10 words), predicativity, past tense, active voice, no opening prepositions or figures, no relative time indicators, etc. However, the headlines were not always consistent with the content.
news, news headline, automatic generation, machine learning, RuGPT-3 model, neural networks, ROUGE metric
1. Amzin A. A. Journalism of online news. Moscow: Aspekt Press, 2011, 142. (In Russ.)] https://elibrary.ru/sbfkaz
2. Akhmadulin E. V. The "News" as the basis of journalism. Humanitarian Vector, 2020, 15(5): 149–154. (In Russ.)] https://doi.org/10.21209/1996-7853-2020-15-5-149-154
3. Belyakova A. Yu., Belyakov Yu. D. Overview of text summarization methods. Inzhenernyi vestnik Dona, 2020, (10): 142–159. (In Russ.)] https://elibrary.ru/ayyyfq
4. Goloviznina V. S. Automatic abstracting of texts. Information technologies and nanotechnology (ITNT-2022): Proc. VIII Intern. Conf., Samara, 23–27 May 2022. Samara: Samara University, 2022. (In Russ.)] https://elibrary.ru/evsbxc
5. Gorbachev A. D., Sinitsyn A. V. Comparative analysis of text summarization algorithms for the design and development of a software package. The development of modern science and technology in the context of transformational processes: Proc. XI Intern. Sci.-Prac. Conf., Moscow, 12 May 2023. St. Petersburg: Pechatnyj ceh, 2023, 43–52. (In Russ.)] https://elibrary.ru/nonvjs
6. Dorosh M., Rajkovsky D. I., Pugin K. V. Text summary problem. Innovatsii. Nauka. Obrazovanie, 2022, (49): 2036–2044. (In Russ.)] https://elibrary.ru/znzfhc
7. Dyakova T. V. Basic principles and structure of news mes-sages. Lingua mobilis, 2011, (2): 102–105. (In Russ.)] https://elibrary.ru/rodaws
8. Zhigalov A. Yu., Grishina L. S., Bolodurina I. P. Research of artificial intelligence models for automatic and abstracting of texts. Digital technologies in education, science, and society: Proc. XVII All-Russian Sci.-Prac. Conf., Petrozavodsk, 22–24 Nov 2023. Petrozavodsk: PetrSU, 2023, 36–38. (In Russ.)] https://elibrary.ru/tugzpu
9. Ivanova S. V. News as a genre of discourse: A non-missing structure. Terra Linguistica, 2022, 13(3): 7–14. (In Russ.)] https://doi.org/10.18721/JHSS.13301
10. Kolesnichenko A. V. Practical journalism. 3rd ed. Moscow: Moscow University, 2020, 191. (In Russ.)]
11. Korotkikh E. G., Nosenko N. V. Semantic and pragmatic text compression in teaching English for special purposes. Modern problems of science and education, 2021, (2). (In Russ.)] https://doi.org/10.17513/spno.30665
12. Makushin A. B. Modern treatment of the concept of news in the conditions of media convergence. Vestnik Kemerovskogo gosudarstvennogo universiteta, 2014, (2-2): 187–189. (In Russ.)] https://elibrary.ru/smmxjz
13. Sorokina S. G. Intelligent text processing: A review of automated summarization methods. Virtual Communication and Social Networks, 2024a, 3(3): 203–222. (In Russ.)] https://doi.org/10.21603/2782-4799-2024-3-3-203-222
14. Sorokina S. G. Applying automatic summarization technology to academic publications. Three L’s of modern humanities: Linguistics, literary studies, and linguadidactics: Proc. All-Russian Sci.-Prac. Conf., Moscow, 23 Nov 2023. Moscow: Yazyki Narodov Mira, 2024b, 132–138. (In Russ.)] https://elibrary.ru/duydpi
15. Troitskiy Yu. L. News as literature: Experimental practice. The New Philological Bulletin, 2017, (3): 52–59. (In Russ.)] https://elibrary.ru/yllsqd
16. Shevchuk A. A. Encoder-decoder neural network for automatic news headline generation. Relevant issues of linguistics and literary studies: Proc. VI (XX) Intern. Conf., Tomsk, 18–19 Apr 2019. Tomsk: STT LLC, 2020, 100–101. (In Russ.)] https://elibrary.ru/oqgvly
17. Abualigah L., Bashabsheh M. Q., Alabool H., Shehab M. Text summarization: A brief review. Recent advances in NLP: The case of arabic language, eds. Abd Elaziz M., Al-qaness M. A. A., Ewees A. A., Dahou A. Cham: Springer, 2020, 1–15. https://doi.org/10.1007/978-3-030-34614-0_1
18. Alami N., Mallahi M. E., Amakdouf H., Qjidaa H. Hybrid method for text summarization based on statistical and semantic treatment. Multimedia Tools and Applications, 2021, 80(13): 19567–19600. https://doi.org/10.1007/s11042-021-10613-9
19. Bao G., Zhang Y. A general contextualized rewriting framework for text summarization. IEEE/ACM Transactions on Audio Speech and Language Processing, 2023, 31: 1624–1635. https://doi.org/10.1109/TASLP.2023.3268569
20. Chen D., Ma S., Harimoto K., Bao R., Su Q., Sun X. Group, extract and aggregate: Summarizing a large amount of finance news for forexmovement prediction. Proceedings of the Second Workshop on Economics and Natural Language Processing, eds. Hahn U., Hoste V., Zhang Z. Hong Kong: Association for Computational Linguistics, 2019, 41–50. https://doi.org/10.18653/v1/D19-5106
21. Gupta A., Chugh D., Anjum, Katarya R. Automated news summarization using transformers. Sustainable advanced computing, eds. Aurelia S., Hiremath S. S., Subramanian K., Biswas S. Kr. Springer, 2022, 249–259. https://doi.org/10.1007/978-981-16-9012-9_21
22. Hayatin N., Ghufron K. M., Wicaksono G. W. Summarization of COVID-19 news documents deep learning-based using transformer architecture. TELKOMNIKA. Telecommunication Computing Electronics and Control, 2021, 19(3): 754–761. https://doi.org/10.12928/TELKOMNIKA.v19i3.18356
23. Jalil Z., Nasir J. A., Nasir M. Extractive multi-document summarization: A review of progress in the last decade. IEEE Access, 2021, 9: 130928–130946. https://doi.org/10.1109/ACCESS.2021.3112496
24. Jiang J., Zhang H., Dai C., Zhao Q., Feng H., Ji Z., Ganchev I. Enhancements of attention-based bidirectional LSTM for hybrid automatic text summarization. IEEE Access, 2021, 9: 123660–123671. https://doi.org/10.1109/ACCESS.2021.3110143
25. Kumari N., Sharma N., Singh P. Performance of optimizers in text summarization for news articles. Procedia Computer Science, 2023, 218: 2430–2437. https://doi.org/10.1016/j.procs.2023.01.218
26. Ma T., Pan Q., Rong H., Qian Y., Tian Y., Al-Nabhan N. T-BERTSum: Topic-aware text summarization based on bert. IEEE Transactions on Computational Social Systems, 2022, 9(3): 879–890. https://doi.org/10.1109/TCSS.2021.3088506
27. Muniraj P., Sabarmathi K. R., Leelavathi R., Balaji S. HNTSumm: Hybrid text summarization of transliterated news articles. International Journal of Intelligent Networks, 2023, 4: 53–61. https://doi.org/10.1016/j.ijin.2023.03.001
28. Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser Ł., Polosukhin I. Attention is all you need. 31st International Conference on Neural Information Processing Systems (NIPS'17): Proc. Conf., Long Beach, 4–9 Dec 2017. NY: Curran Associates, 2017, 6000–6010. https://doi.org/10.48550/arXiv.1706.03762
29. Yadav A. K., Ranvijay, Yadav R. S., Maurya A. K. Graph-based extractive text summarization based on single document. Multimedia Tools and Applications, 2024, 83(7): 18987–19013. https://doi.org/10.1007/s11042-023-16199-8
30. Yao K., Zhang L., Du D., Luo T., Tao L., Wu Y. Dual encoding for abstractive text summarization. IEEE Transactions on Cybernetics, 2020, 50(3): 985–996. https://doi.org/10.1109/TCYB.2018.2876317
31. Zhou H., Ren W., Liu G., Su B., Lu W. Entity-aware abstractive multi-document summarization. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, eds. Zong Ch., Xia F., Li W., Navigli R. Stroudsburg: Association for Computational Linguistics, 2021, 351–362. https://doi.org/10.18653/v1/2021.findings-acl.30