Semantic Indexing and Cluster Analysis of Cybersecurity Documents

This study examines methods for extracting concepts from textual messages and constructing semantic networks for text data analysis, specifically within the context of cyberthreats. The semantic networks are essential tools for identifying key concepts and their relationships which provide a better...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Datum:	2024
Hauptverfasser:	Ланде, Д. В., Рибак, О. О.
Format:	Artikel
Sprache:	Ukrainisch
Veröffentlicht:	Інститут проблем реєстрації інформації НАН України 2024
Schlagworte:	Semantic Indexing Cluster Analysis Modularity Large Language Models (LLMs) Cybersecurity Text Analysis Semantic Networks
Online Zugang:	http://drsp.ipri.kiev.ua/article/view/316711
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:	Data Recording, Storage & Processing

Institution

Data Recording, Storage & Processing

_version_	1856543224049434624
author	Ланде, Д. В. Рибак, О. О.
author_facet	Ланде, Д. В. Рибак, О. О.
author_sort	Ланде, Д. В.
baseUrl_str
collection	OJS
datestamp_date	2025-08-09T14:38:13Z
description	This study examines methods for extracting concepts from textual messages and constructing semantic networks for text data analysis, specifically within the context of cyberthreats. The semantic networks are essential tools for identifying key concepts and their relationships which provide a better understanding of the relationships between concepts and help uncover critical data such as hacker group names, malicious programs, vulnerabilities, and other threats. Such an approach can be applied in cybersecurity, where textual information can contain vital data for preventing and responding to cyber threats. The focus is on the use of large language models (LLMs) that enable automated extraction of entities and the construction of concept networks. Utilizing LLMs for information extraction from text data helps create networks of relationships that can be used to analyze causal links between events and objects, detect interdependencies, and structure information. These networks can be further employed for cluster analysis, allowing for the automatic grouping of nodes by similarity and the identification of new patterns in the data. The research also addresses the construction of document proximity networks, which assess the degree of similarity between texts based on their semantic structures. This enables the identification of thematically related documents that may contain significant information for analysis, as well as the detection of informational chains and key trends within large textual datasets. By applying the methods described in the article, it is possible to effectively structure and analyze large volumes of textual information in cybersecurity, facilitating quicker threat detection and the formulation of strategies for prevention. This approach also allows for the streamline of many stages of analytical work to do, thereby enhancing the efficiency of big data analysis. Fig.: 3. Refs: 11 titles.
first_indexed	2025-07-17T10:59:09Z
format	Article
id	drspiprikievua-article-316711
institution	Data Recording, Storage & Processing
language	Ukrainian
last_indexed	2025-09-17T09:26:42Z
publishDate	2024
publisher	Інститут проблем реєстрації інформації НАН України
record_format	ojs
spelling	drspiprikievua-article-3167112025-08-09T14:38:13Z Semantic Indexing and Cluster Analysis of Cybersecurity Documents Семантичне індексування та кластерний аналіз документів з кібербезпеки Ланде, Д. В. Рибак, О. О. Semantic Indexing, Cluster Analysis, Modularity, Large Language Models (LLMs), Cybersecurity, Text Analysis, Semantic Networks семантичне індексування, кластерний аналіз, модулярність, великі мовні моделі (LLM), кібербезпека, аналіз тексту, семантичні мережі This study examines methods for extracting concepts from textual messages and constructing semantic networks for text data analysis, specifically within the context of cyberthreats. The semantic networks are essential tools for identifying key concepts and their relationships which provide a better understanding of the relationships between concepts and help uncover critical data such as hacker group names, malicious programs, vulnerabilities, and other threats. Such an approach can be applied in cybersecurity, where textual information can contain vital data for preventing and responding to cyber threats. The focus is on the use of large language models (LLMs) that enable automated extraction of entities and the construction of concept networks. Utilizing LLMs for information extraction from text data helps create networks of relationships that can be used to analyze causal links between events and objects, detect interdependencies, and structure information. These networks can be further employed for cluster analysis, allowing for the automatic grouping of nodes by similarity and the identification of new patterns in the data. The research also addresses the construction of document proximity networks, which assess the degree of similarity between texts based on their semantic structures. This enables the identification of thematically related documents that may contain significant information for analysis, as well as the detection of informational chains and key trends within large textual datasets. By applying the methods described in the article, it is possible to effectively structure and analyze large volumes of textual information in cybersecurity, facilitating quicker threat detection and the formulation of strategies for prevention. This approach also allows for the streamline of many stages of analytical work to do, thereby enhancing the efficiency of big data analysis. Fig.: 3. Refs: 11 titles. Розглянуто методи екстракції концептів із текстів та побудови семантичних мереж для аналізу даних у контексті кібербезпеки. Основна увага приділена використанню великих мовних моделей (LLM) для автоматизованого витягу сутностей і побудови мереж концептів. Це дозволяє визначати взаємозалежності та структурувати інформацію, формувати семантичні мережі. Такі мережі можна використовувати для подальшого кластерного аналізу, що дає можливість автоматично групувати вузли за схожістю та визначати нові закономірності в даних. Досліджено побудову мереж близькості документів, що дозволяє оцінювати ступінь схожості текстів на основі їхніх семантичних структур. Запропонований підхід дозволяє виявляти тематично споріднені документи, що можуть містити важливу інформацію для аналізу, а також визначати інформаційні ланцюжки та ключові тенденції у великих масивах текстових даних, ключові тенденції і загрози у сфері кібербезпеки. Інститут проблем реєстрації інформації НАН України 2024-11-19 Article Article application/pdf http://drsp.ipri.kiev.ua/article/view/316711 10.35681/1560-9189.2024.26.2.316711 Data Recording, Storage & Processing; Vol. 26 No. 2 (2024); 19-32 Регистрация, хранение и обработка данных; Том 26 № 2 (2024); 19-32 Реєстрація, зберігання і обробка даних; Том 26 № 2 (2024); 19-32 1560-9189 uk http://drsp.ipri.kiev.ua/article/view/316711/308964 Авторське право (c) 2024 Реєстрація, зберігання і обробка даних
spellingShingle	Semantic Indexing Cluster Analysis Modularity Large Language Models (LLMs) Cybersecurity Text Analysis Semantic Networks Ланде, Д. В. Рибак, О. О. Semantic Indexing and Cluster Analysis of Cybersecurity Documents
title	Semantic Indexing and Cluster Analysis of Cybersecurity Documents
title_alt	Семантичне індексування та кластерний аналіз документів з кібербезпеки
title_full	Semantic Indexing and Cluster Analysis of Cybersecurity Documents
title_fullStr	Semantic Indexing and Cluster Analysis of Cybersecurity Documents
title_full_unstemmed	Semantic Indexing and Cluster Analysis of Cybersecurity Documents
title_short	Semantic Indexing and Cluster Analysis of Cybersecurity Documents
title_sort	semantic indexing and cluster analysis of cybersecurity documents
topic	Semantic Indexing Cluster Analysis Modularity Large Language Models (LLMs) Cybersecurity Text Analysis Semantic Networks
topic_facet	Semantic Indexing Cluster Analysis Modularity Large Language Models (LLMs) Cybersecurity Text Analysis Semantic Networks семантичне індексування кластерний аналіз модулярність великі мовні моделі (LLM) кібербезпека аналіз тексту семантичні мережі
url	http://drsp.ipri.kiev.ua/article/view/316711
work_keys_str_mv	AT landedv semanticindexingandclusteranalysisofcybersecuritydocuments AT ribakoo semanticindexingandclusteranalysisofcybersecuritydocuments AT landedv semantičneíndeksuvannâtaklasternijanalízdokumentívzkíberbezpeki AT ribakoo semantičneíndeksuvannâtaklasternijanalízdokumentívzkíberbezpeki

Semantic Indexing and Cluster Analysis of Cybersecurity Documents

Institution

Ähnliche Einträge