Analysis of text analytics methods for knowledge extraction from Ukrainian-language social media
The purpose of the study is to review and systematize current text analytics and natural language processing methods for knowledge extraction from unstructured social media content, with a focus on Ukrainian-language sources.A comparative analysis of topic modelling methods (LSA, NMF, LDA, HDP, Top2...
Gespeichert in:
| Datum: | 2026 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Artikel |
| Sprache: | Ukrainisch |
| Veröffentlicht: |
Kyiv National University of Construction and Architecture
2026
|
| Schlagworte: | |
| Online Zugang: | https://es-journal.in.ua/article/view/358171 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Назва журналу: | Environmental safety and natural resources |
Institution
Environmental safety and natural resources| Zusammenfassung: | The purpose of the study is to review and systematize current text analytics and natural language processing methods for knowledge extraction from unstructured social media content, with a focus on Ukrainian-language sources.A comparative analysis of topic modelling methods (LSA, NMF, LDA, HDP, Top2Vec, BERTopic), ontology construction approaches, OSINT data collection tools, and the F1 evaluation metric for named entity recognition tasks was conducted.Comparative analysis of four topic modelling methods applied to real Twitter datasets demonstrated that BERTopic (coherence score 0.62) outperforms LDA (0.45) and Top2Vec (0.56) for short texts; the NER-UK 2.0 corpus provides a baseline solution for Ukrainian named entity recognition with an F1 score of 0.89.Theoretically, the selection of methods that take into account the temporal dynamics of topics is justified. Practically, five-block pipeline architecture for knowledge extraction from Ukrainian-language social media is proposed.The originality of the work lies in the adaptation of the Methontology-based approach to ontology generation for short unstructured Ukrainian-language texts.Further prospects include practical implementation and validation of the proposed pipeline on real Ukrainian social media datasets. |
|---|---|
| DOI: | 10.32347/2411-4049.2026.1.161-170 |