Analysis of text analytics methods for knowledge extraction from Ukrainian-language social media

The purpose of the study is to review and systematize current text analytics and natural language processing methods for knowledge extraction from unstructured social media content, with a focus on Ukrainian-language sources.A comparative analysis of topic modelling methods (LSA, NMF, LDA, HDP, Top2...

Full description

Saved in:

Bibliographic Details
Date:	2026
Main Authors:	Terentiev, Oleksandr, Abroskin, Yurii, Duda, Volodymyr, Prosyankina-Zharova, Tetyana
Format:	Article
Language:	Ukrainian
Published:	Kyiv National University of Construction and Architecture 2026
Subjects:	text analytics data processing Coherence Score F1-score LSA NMF LDA Top2Vec BERTopic OSINT
Online Access:	https://es-journal.in.ua/article/view/358171
Tags:	Add Tag No Tags, Be the first to tag this record!
Journal Title:	Environmental safety and natural resources

Institution

Environmental safety and natural resources

Description
Summary:	The purpose of the study is to review and systematize current text analytics and natural language processing methods for knowledge extraction from unstructured social media content, with a focus on Ukrainian-language sources.A comparative analysis of topic modelling methods (LSA, NMF, LDA, HDP, Top2Vec, BERTopic), ontology construction approaches, OSINT data collection tools, and the F1 evaluation metric for named entity recognition tasks was conducted.Comparative analysis of four topic modelling methods applied to real Twitter datasets demonstrated that BERTopic (coherence score 0.62) outperforms LDA (0.45) and Top2Vec (0.56) for short texts; the NER-UK 2.0 corpus provides a baseline solution for Ukrainian named entity recognition with an F1 score of 0.89.Theoretically, the selection of methods that take into account the temporal dynamics of topics is justified. Practically, five-block pipeline architecture for knowledge extraction from Ukrainian-language social media is proposed.The originality of the work lies in the adaptation of the Methontology-based approach to ontology generation for short unstructured Ukrainian-language texts.Further prospects include practical implementation and validation of the proposed pipeline on real Ukrainian social media datasets.
DOI:	10.32347/2411-4049.2026.1.161-170

Analysis of text analytics methods for knowledge extraction from Ukrainian-language social media

Institution

Similar Items