Analysis of text analytics methods for knowledge extraction from Ukrainian-language social media

The purpose of the study is to review and systematize current text analytics and natural language processing methods for knowledge extraction from unstructured social media content, with a focus on Ukrainian-language sources.A comparative analysis of topic modelling methods (LSA, NMF, LDA, HDP, Top2...

Full description

Saved in:
Bibliographic Details
Date:2026
Main Authors: Terentiev, Oleksandr, Abroskin, Yurii, Duda, Volodymyr, Prosyankina-Zharova, Tetyana
Format: Article
Language:Ukrainian
Published: Kyiv National University of Construction and Architecture 2026
Subjects:
Online Access:https://es-journal.in.ua/article/view/358171
Tags: Add Tag
No Tags, Be the first to tag this record!
Journal Title:Environmental safety and natural resources

Institution

Environmental safety and natural resources
Description
Summary:The purpose of the study is to review and systematize current text analytics and natural language processing methods for knowledge extraction from unstructured social media content, with a focus on Ukrainian-language sources.A comparative analysis of topic modelling methods (LSA, NMF, LDA, HDP, Top2Vec, BERTopic), ontology construction approaches, OSINT data collection tools, and the F1 evaluation metric for named entity recognition tasks was conducted.Comparative analysis of four topic modelling methods applied to real Twitter datasets demonstrated that BERTopic (coherence score 0.62) outperforms LDA (0.45) and Top2Vec (0.56) for short texts; the NER-UK 2.0 corpus provides a baseline solution for Ukrainian named entity recognition with an F1 score of 0.89.Theoretically, the selection of methods that take into account the temporal dynamics of topics is justified. Practically, five-block pipeline architecture for knowledge extraction from Ukrainian-language social media is proposed.The originality of the work lies in the adaptation of the Methontology-based approach to ontology generation for short unstructured Ukrainian-language texts.Further prospects include practical implementation and validation of the proposed pipeline on real Ukrainian social media datasets.
DOI:10.32347/2411-4049.2026.1.161-170