Analysis of text analytics methods for knowledge extraction from Ukrainian-language social media

The purpose of the study is to review and systematize current text analytics and natural language processing methods for knowledge extraction from unstructured social media content, with a focus on Ukrainian-language sources.A comparative analysis of topic modelling methods (LSA, NMF, LDA, HDP, Top2...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Datum:2026
Hauptverfasser: Terentiev, Oleksandr, Abroskin, Yurii, Duda, Volodymyr, Prosyankina-Zharova, Tetyana
Format: Artikel
Sprache:Ukrainisch
Veröffentlicht: Kyiv National University of Construction and Architecture 2026
Schlagworte:
Online Zugang:https://es-journal.in.ua/article/view/358171
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:Environmental safety and natural resources

Institution

Environmental safety and natural resources
Beschreibung
Zusammenfassung:The purpose of the study is to review and systematize current text analytics and natural language processing methods for knowledge extraction from unstructured social media content, with a focus on Ukrainian-language sources.A comparative analysis of topic modelling methods (LSA, NMF, LDA, HDP, Top2Vec, BERTopic), ontology construction approaches, OSINT data collection tools, and the F1 evaluation metric for named entity recognition tasks was conducted.Comparative analysis of four topic modelling methods applied to real Twitter datasets demonstrated that BERTopic (coherence score 0.62) outperforms LDA (0.45) and Top2Vec (0.56) for short texts; the NER-UK 2.0 corpus provides a baseline solution for Ukrainian named entity recognition with an F1 score of 0.89.Theoretically, the selection of methods that take into account the temporal dynamics of topics is justified. Practically, five-block pipeline architecture for knowledge extraction from Ukrainian-language social media is proposed.The originality of the work lies in the adaptation of the Methontology-based approach to ontology generation for short unstructured Ukrainian-language texts.Further prospects include practical implementation and validation of the proposed pipeline on real Ukrainian social media datasets.
DOI:10.32347/2411-4049.2026.1.161-170