Удосконалення алгоритму SOM для забезпечення стабільності та відтворюваності результатів кластеризації даних
The article proposes a method to improve the Kohonen Self-Organizing Map (SOM) learning algorithm to ensure the stability and reproducibility of clustering results, an urgent task when working with large amounts of data. SOM is widely used in clustering and visualization tasks, especially in applica...
Gespeichert in:
| Datum: | 2026 |
|---|---|
| Hauptverfasser: | , |
| Format: | Artikel |
| Sprache: | Englisch |
| Veröffentlicht: |
The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
2026
|
| Schlagworte: | |
| Online Zugang: | https://journal.iasa.kpi.ua/article/view/358080 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Назва журналу: | System research and information technologies |
| Завантажити файл: | |
Institution
System research and information technologies| _version_ | 1862949218390376448 |
|---|---|
| author | Ivashchenko, Oleksandr Fedin, Serhii |
| author_facet | Ivashchenko, Oleksandr Fedin, Serhii |
| author_sort | Ivashchenko, Oleksandr |
| baseUrl_str | http://journal.iasa.kpi.ua/oai |
| collection | OJS |
| datestamp_date | 2026-04-19T21:53:19Z |
| description | The article proposes a method to improve the Kohonen Self-Organizing Map (SOM) learning algorithm to ensure the stability and reproducibility of clustering results, an urgent task when working with large amounts of data. SOM is widely used in clustering and visualization tasks, especially in applications that require analyzing multidimensional data structures, such as telecommunications billing systems and financial analysis. The standard SOM implementation, which includes random weight initialization and stochastic sample selection during training, leads to significant cluster variability even when using the same input data and identical network training parameters. This makes it difficult to apply this algorithm in cases where stability and reproducibility of results are required. To solve this problem, we propose modifying the algorithm to include its own random number generator and introducing a seed parameter to fix the initial training conditions. This reduces variability and ensures reproducible clustering results, thereby increasing the reliability of the analysis and the suitability of the SOM algorithm for real business tasks. The proposed method has been tested on data from billing systems, where the reproducibility of clustering results is critical for effective work with customer segments, the development of targeted marketing strategies, and the creation of personalized tariff plans. |
| doi_str_mv | 10.20535/SRIT.2308-8893.2026.1.08 |
| first_indexed | 2026-04-20T01:00:22Z |
| format | Article |
| fulltext |
O.V. Ivashchenko, S.S. Fedin, 2026
112 ISSN 1681–6048 System Research & Information Technologies, 2026, № 1
TIÄC
МЕТОДИ, МОДЕЛІ ТА ТЕХНОЛОГІЇ ШТУЧНОГО
ІНТЕЛЕКТУ В СИСТЕМНОМУ АНАЛІЗІ ТА
УПРАВЛІННІ
UDC 004.855.5:519.237.8
DOI: 10.20535/SRIT.2308-8893.2026.1.08
IMPROVING THE SOM ALGORITHM TO ENSURE STABILITY
AND REPRODUCIBILITY OF DATA CLUSTERING RESULTS
O.V. IVASHCHENKO, S.S. FEDIN
Abstract. The article proposes a method to improve the Kohonen Self-Organizing
Map (SOM) learning algorithm to ensure the stability and reproducibility of
clustering results, an urgent task when working with large amounts of data. SOM is
widely used in clustering and visualization tasks, especially in applications that
require analyzing multidimensional data structures, such as telecommunications
billing systems and financial analysis. The standard SOM implementation, which
includes random weight initialization and stochastic sample selection during
training, leads to significant cluster variability even when using the same input data
and identical network training parameters. This makes it difficult to apply this
algorithm in cases where stability and reproducibility of results are required. To
solve this problem, we propose modifying the algorithm to include its own random
number generator and introducing a seed parameter to fix the initial training
conditions. This reduces variability and ensures reproducible clustering results,
thereby increasing the reliability of the analysis and the suitability of the SOM
algorithm for real business tasks. The proposed method has been tested on data from
billing systems, where the reproducibility of clustering results is critical for effective
work with customer segments, the development of targeted marketing strategies, and
the creation of personalized tariff plans.
Keywords: Kohonen self-organizing maps (SOM), data clustering, seed parameter,
reproducibility of results, random number generator.
INTRODUCTION
In today’s environment, telecommunications companies process large amounts of
data on a daily basis that contain valuable information about subscriber behavior
and service usage. This data plays an important role in making strategic decisions,
developing personalized offers, and increasing the competitiveness of companies.
One of the key tasks is to apply clustering to organize and analyze the data,
allowing to draw useful analytical conclusions for further use.
Among modern approaches to clustering and visualization of multidi-
mensional data, Kohonen’s Self-Organizing Maps (SOM) [1] occupy a special
place, preserving the topological structure of the data and allowing an intuitive
understanding of its grouping through visualization. At the same time, algorithmic
features such as random initialization of weights and stochastic samples selection
during training lead to the fact that the results can differ significantly for the same
input data. This creates difficulties in cases where stability and reproducibility of
Improving the SOM algorithm to ensure stability and reproducibility of data clustering results
Системні дослідження та інформаційні технології, 2026, № 1 113
clustering results are required, which is important for making informed decisions
in business and analytics.
This opens up the possibility of improving the method to increase the
reproducibility and accuracy of clustering, which is important for scientists and
analysts using this tool.
LITERATURE ANALYSIS AND PROBLEM STATEMENT
Kohonen’s Self-Organizing Map is a popular method for data analysis and
clustering that allows to identify similar groups of objects in a data set and
simplify their structure for further use. However, despite the widespread use of
SOM in industries such as telecommunications, finance, and engineering, the
problem of stability and reproducibility of clustering results remains unresolved.
This limits the practical application of the algorithm in tasks requiring accurate
and stable data grouping.
Researchers have already proposed some improvements to the SOM
algorithm to increase the accuracy and reliability of clustering.
For example, Panu Somervuo and Teuvo Kohonen [2] discuss clustering
large protein sequence databases using an extended SOM that allows the creation
of clusters of protein sequences without converting the data into histogram
vectors. In his study, Mark Van Hulle [3] analyzes the basic SOM algorithm, its
properties, and the possibility of extending it to work with categorical data, time
series, and tree structures. Jens Claussen [4] proposes the Winner-Relaxing Self-
Organizing Maps (WRSOM) approach, which ensures the stability of the cluster
location. Despite the success of these approaches, the problem of variability of
results is still relevant.
The study by Melody Kiang, Michael Hu, Dorothy Fisher [5] considers the
use of an extended version of Kohonen’s self-organizing maps to segment the
market of telecommunications companies based on behavioral and demographic
factors, including the frequency of long-distance calls, household structure, and so
on. By using the enhanced SOM algorithm, they were able to achieve better
results than standard clustering methods such as factor analysis and k-means.
However, the authors note that the stability of the resulting clusters could be
improved, which opens up prospects for further improving the efficiency of this
method. Similarly, in a study by Wei Wang, Shiwei Xu, Hong Ouyang, and
Xinyu Zeng [6], where SOM was used to optimize the parameters of the power
systems of unmanned electric drive chassis, improvements were proposed by
combining SOM with an advanced genetic algorithm that uses isolated niches to
improve the accuracy of the results. This increased the convergence rate and
improved the global search capability of the algorithm, providing a more accurate
clustering of the initial populations. This approach demonstrates the potential of
combining SOM with other algorithms to solve complex multitask optimization
problems, which emphasizes the importance of further research to improve the
stability and accuracy of SOM results for various engineering and practical
applications.
The study by V. Dyachenko, O. Lyashenko, B. Ibrahim, O. Michal, and
Y. Koltun [7] proposed a modification of Kohonen’s self-organizing maps with
a parallel learning algorithm that can significantly increase the speed of data
O.V. Ivashchenko, S.S. Fedin
ISSN 1681–6048 System Research & Information Technologies, 2026, № 1 114
processing in multi-core processor systems. This approach demonstrates effec-
tiveness in the tasks of clustering large amounts of data, ensuring the adaptation
of the algorithm to a dynamic environment. The work of Rodrigo Cavalcanti,
Bruno Pimentel, Carlos Almeida, and Renata Souza [8] presents a new variant of
the Fuzzy Kohonen Clustering Network (FKCN), which uses the fuzzy c-means
membership function instead of a fixed learning coefficient. This approach takes
into account the intraclass and interclass variance, which allows to obtain better
clustering results on real and synthetic data sets.
The works of N.I. Furmanova, O.Y. Farafonov, O.Y. Malyi, Y.O. Sitsilitsyn,
V.O. Dyachenko, O.P. Mikhal, E.A. Egorova, V.G. Ivanov, E.S. Sakalo [9–11]
consider various approaches to improving SOM, which demonstrate the wide
application of this method in optimization, clustering, and data analysis. In
particular, N.I. Furmanova, O.Y. Farafonov, and Y.O. Sitsilitsyn study the
integration of SOM with genetic algorithms to reduce computational costs and
avoid local minima in multidimensional optimization problems. V.O. Dyachenko
and O.P. Mikhal proposed an improvement of SOM for work in distributed
energy-critical sensor networks by parallel selection of several winning neurons,
which reduces power consumption and optimizes computation time. E.A. Egorova,
V.G. Ivanov, and E.S. Sakalo use the Kalman-Mayne filter to adapt SOM,
providing accurate clustering even in the presence of noise in the data.
Despite the progress made, all authors note the importance of further
research aimed at selecting optimal SOM parameters, such as the neighborhood
function, initialization of the weights, and adaptation of the algorithm to dynamic
data.
Given the need to increase the stability of SOM clustering results, the
purpose of this study is to develop an improved approach to its application that
ensures reproducibility of results.
The variability of SOM results is caused by several factors. First of all, the
standard implementation of the algorithm randomly initializes the neural weights,
resulting in different initial conditions even for the same input data. In addition,
the learning process involves stochastic samples selection, which also introduces
randomness at each stage of clustering. As a result, even if the algorithm is run
repeatedly with the same data and parameters, SOM may generate different
clusters.
The non-reproducibility of clustering complicates the practical application of
SOM, especially in cases where stability of results is required. For example, in
billing systems of telecommunication companies, the analysis of customer
behavior data requires accurate and stable segmentation to generate personalized
tariffs, etc. The variability of clustering results makes it difficult to accurately
identify customer groups, which can lead to errors in understanding their needs
and creating appropriate marketing strategies.
Thus, in order to eliminate the variability of SOM clustering results, it is
necessary to develop a method that ensures reproducibility of the output data. In
this study, we propose an approach that involves the introduction of a proprietary
random number generator and a seed parameter for fixed initialization of network
weights. This solution will eliminate the stochastic influence of the algorithm on
the clustering results and increase the reliability of data analysis in
telecommunication systems.
Improving the SOM algorithm to ensure stability and reproducibility of data clustering results
Системні дослідження та інформаційні технології, 2026, № 1 115
PURPOSE AND OBJECTIVES OF THE STUDY
The purpose of the study is to develop an improved algorithm for Kohonen’s self-
organizing maps that ensures stability and reproducibility of clustering results.
This improvement is aimed at eliminating the variability of results arising from
the random initialization of weight coefficients and stochastic selection of
samples during training.
The following tasks were set to achieve this goal:
identify the key factors that cause variability in SOM clustering results
and assess their impact on the stability of the algorithm;
to develop a method to improve the SOM algorithm by introducing its
own random number generator and seed parameter to fix the initial training condi-
tions;
to test the effectiveness of the proposed method by analyzing data from
telecommunications companies’ billing systems and assess its impact on the sta-
bility and reproducibility of clustering results.
MATERIALS AND METHODS OF RESEARCH
The Kohonen algorithm
Self-organizing maps are based on Kohonen neural networks and are designed to
visualize multidimensional objects on a two-dimensional map, where the
distances between objects correspond to the distances between their vectors in a
multidimensional space, and the feature values themselves are displayed in
different colors and shades [12].
The basic idea behind SOM is to create a two-dimensional mapping structure
in which neighboring nodes on the map reflect the similarity between data. Each
node on the map has weights that represent vectors in the feature space. During
SOM training, these weights are changed to match the structure and distribution
of the data [13; 14].
The network construction is based on competitive learning, where the output
nodes (neurons) compete with each other for “victory”. In the course of the
competition, during the training process, neurons are selectively tuned for
different input examples [15].
Fig. 1. Kohonen’s network model
O.V. Ivashchenko, S.S. Fedin
ISSN 1681–6048 System Research & Information Technologies, 2026, № 1 116
Input neurons form the input layer of the network, which contains one
neuron for each input field. As in a regular network, input neurons do not
participate in the training process. Their task is to transfer the values of the input
fields of the initial sample to the neurons of the output layer. Each connection
between neurons has a certain weight, which is randomly set in the interval [0;1]
during initialization. The learning process consists in adjusting the weights.
Unlike most neural networks, the Kohonen network has no hidden layers: the data
from the input layer is sent directly to the output layer, whose neurons are
arranged in a one- or two-dimensional grid of rectangular or hexagonal
shape [16].
During SOM training, the following main stages are performed:
1. Competition: each output neuron calculates the distance between its
weight vector and the input vector. The neuron with the smallest distance is
declared the winner.
2. Cooperation: the winning neuron determines a group of neighboring
neurons that also participate in the weight adjustment. This ensures the similarity
of the weight vectors between neighboring neurons.
3. Adaptation: the weights of the winning neuron and its neighbors are
adjusted to get closer to the input vector, promoting network self-organization and
clustering.
The learning process of the Kohonen network involves a gradual decrease in
the learning rate, which depends on the number of iterations. The training is
divided into two phases: coarse tuning (with a larger influence radius and faster
learning speed) and fine tuning (with a smaller radius and slower adaptation).
At the initial stage, if there is no a priori information about the distribution of
data in the sample, the neuronal weights are initialized with random values. At the
same time, the initial values of the learning rate and the learning radius R are set,
which determines the number of neurons that are considered neighbors of the
winning neuron and change their weights along with it. At the beginning of
training, the radius R has a maximum value and gradually decreases with each
iteration, which allows the network to accurately adapt to the data structure.
The Kohonen network training algorithm is based on the principles of
unsupervised learning, i.e., without a teacher, and includes seven stages [15–18]:
1. Setting up the network structure (the number of neurons in the Kohonen
layer).
2. Initialize the weight coefficients with random values according to the
formula:
0;1 * max min min ij ni ni niw random x x x ,
where xni is an input vector; wij ia a vector of weight coefficients.
3. Competition. Supplying a random training example of the current training
iteration to the network inputs and calculating the Euclidean distances from the
input vector to the centers of all clusters:
2
,j n ij ni
i
D W X w x ,
where xni is an input vector; wij is a vector of weight coefficients.
Improving the SOM algorithm to ensure stability and reproducibility of data clustering results
Системні дослідження та інформаційні технології, 2026, № 1 117
The output neuron whose weight vector has the smallest distance to the
object feature vector is declared the winner.
4. Merge. All neurons located within the training radius relative to the
winning neuron are identified.
5. Adjustment. According to the smallest of the values of Rj, the winning
neuron j is selected, which is closest to the input vector in terms of values. For the
selected neuron (and only for it), the weight coefficients are corrected:
*new current current
ij ij ni ijw w l x w ,
where xni is an input vector; wij
new ia a new vector of weight coefficients; wij
current
is the current vector of weight coefficients; l is the learning rate coefficient.
Fig. 2. Adjusting the weights of neurons
6. Correction. Changes the learning rate parameter according to the
specified law.
*new currentl l exp i i ,
where lnew is the adjusted learning rate parameter; l is the initial learning rate
parameter; icurrent is the current iteration; i is the total number of iterations.
7. The cycle is repeated from stage 3 (competition) until the end condition is
met: stabilization of the neural network outputs or the specified number of
iterations.
Technical implementation of the SOM algorithm improvement
In order to ensure reproducibility of clustering results, the standard
implementation of the Kohonen Self-Organizing Maps algorithm was improved in
this study. The main goal of the improvements was to eliminate the variability of
results caused by random initialization of weights and stochastic samples
selection during training. The proposed solution includes the implementation of a
proprietary random number generator with the ability to fix initial conditions
using the seed parameter.
At the stage of improving the SOM algorithm, a special random number
generator was implemented that uses the mathematical function of sine to
generate values. The code of this generator is implemented as a
RandomGenerator class in C# (Fig. 3). The main feature of this generator is the
O.V. Ivashchenko, S.S. Fedin
ISSN 1681–6048 System Research & Information Technologies, 2026, № 1 118
ability to fix the initial state using the seed parameter, which reduces the
variability of the initial conditions and, as a result, stabilizes the clustering results.
Implementation features:
The seed: parameter is set when creating an instance of the
RandomGenerator class, which determines the initial state of the generator. This
ensures the determinism of the sequence of pseudo-random numbers.
Generation algorithm: a pseudo-random number is calculated as the
residual of multiplying a sine by a scaling factor of 10000. This allows to get a
uniform distribution of values within [0; 1).
Incrementing the seed: after each call to the Next() function, the seed
value is incremented, ensuring a consistent change in the output values.
Fig. 3. Implementation of a random number generator using the seed parameter
Fig. 4. The main window of the application
Fig. 5. Neural network training window, entering the seed parameter
Improving the SOM algorithm to ensure stability and reproducibility of data clustering results
Системні дослідження та інформаційні технології, 2026, № 1 119
The generated generator is used to initialize the weights of the SOM neurons.
Each time the algorithm is run with the same value of the seed parameter, the
neuronal weights receive the same initial values, making it impossible to vary clusters
due to random initialization. In addition, the seed parameter controls the order in
which training samples are selected during network training, which eliminates the
stochastic influence on the clustering process.
To ensure the convenience of working with the advanced algorithm, an
interface based on Windows Presentation Foundation (WPF) was created. The
interface is designed to provide the user with easy access to all key functions. In
particular, the user can upload input data, configure training parameters such as
map dimension, number of iterations and seed value, which allows to control the
initial training conditions. In addition, after training is complete, the user can view
the clustering results in the form of a Kohonen map, which provides visualization
of the results and allows detailed analysis of the cluster distribution.
RESEARCH RESULTS
Using the created application based on the improved Kohonen algorithm, the
customer base of the telecommunications company was clustered with different
values of the seed parameter, the results of which are shown in Figs. 6 and 7. The
data contained information about the demographic characteristics of customers,
their activity and the intensity of service use, which made it possible to create
clear customer segments by behavioral characteristics.
Clustering with different seed values
Fig. 6 shows the clustering results for different values of the seed parameter (85690
and 368). It can be seen that changing this parameter leads to a change in the shape
and location of the clusters. For example, the clusters highlighted in the figure change
their boundaries significantly: the same customer segment can move around the map
and change its shape and size. This makes it difficult to identify stable customer
groups and can lead to difficulties in analyzing them accurately.
Fig. 6. Generated maps for different values of the seed parameter
O.V. Ivashchenko, S.S. Fedin
ISSN 1681–6048 System Research & Information Technologies, 2026, № 1 120
Clustering with the same seed value
Fig. 7 shows the results of clustering using the same value of the seed parameter
(6548). In this case, all resulting Kohonen maps are identical, regardless of the
number of algorithm runs. The clusters highlighted in the figure have the same shape
and location, confirming the stability of the algorithm. This shows that using the same
value of the seed parameter guarantees not only the stability of individual clusters, but
also the complete reproducibility of the entire Kohonen map.
Fig. 7. Generated maps for the same values of the seed parameter
Therefore, the use of a fixed seed parameter allows to achieve full
reproducibility of the results, which is impossible in the case of random
initialization of the weights.
The seed values (85690, 368, and 6548) used in this study were chosen
randomly. However, each identical set of seed values guarantees identical
clustering results, ensuring that the shape and location of the clusters remain
consistent. This improvement eliminates the variability in results that occurred
due to the random initialization of weights and samples selection during the
algorithm. This emphasizes the reproducibility of results, which is an important
aspect for scientific research and practical use of the SOM algorithm.
DISCUSSION OF THE OBTAINED RESULTS
The results of the study confirmed that the improved Kohonen Self-Organizing
Map algorithm with the implemented seed parameter for fixed initialization of
network weights ensures stability and reproducibility of clustering results. Using
the same seed value eliminated the problem of variability caused by random
initialization of weights and stochastic samples selection during network training.
This is especially important in tasks where accurate group identification is critical
for decision making.
Improving the SOM algorithm to ensure stability and reproducibility of data clustering results
Системні дослідження та інформаційні технології, 2026, № 1 121
A comparative analysis of the clustering results showed that with different values
of the seed parameter (Fig. 6), there was significant variation in the size, shape, and
location of the clusters. The same customer segment could change its boundaries or
move around the map, making it difficult to interpret and analyze the data
consistently. Instead, clustering with a fixed seed value (Fig. 7) ensured complete
identity the Kohonen maps on each run of the algorithm, confirming the stability and
reliability of the results.
The practical significance of the results obtained is particularly relevant for
telecommunications companies, where the stability of customer segmentation plays a
key role in the development of personalized tariff plans and marketing strategies. The
improved algorithm provides reliable analysis of customer behavior data, which
enables more accurate marketing budget calculations, minimizes customer churn, and
increases the efficiency of customer base management.
Compared to other approaches to improve SOM, such as combining it with
genetic algorithms or using modifications of WRSOM, the proposed method is
simple to implement and does not require additional computational resources. This
makes it an effective solution for tasks that require stable results at minimal technical
costs.
Some limitations of the proposed approach should also be noted. The value of
the seed parameter needs to be adapted for different data sets, which may require
additional testing to achieve optimal results. In future research, it is advisable to
consider automating the selection of the seed parameter or adapting it to work with
dynamic data. This will increase the flexibility and versatility of the algorithm for a
wider range of tasks.
Thus, the results of the study showed that the introduction of the seed parameter
allows to achieve stable and reproducible clustering, which is important for scientific
research and practical use in business intelligence, especially for telecommunication
systems.
CONCLUSIONS
1. The key factors of variability in the results of SOM clustering are identified.
The main reasons for the instability of the results are the random initialization of
neuronal weights and the stochastic selection of training samples during network
training. These factors lead to different locations and shapes of clusters for the same
input data, making stable analysis impossible.
2. A method to improve the SOM algorithm is developed. It is proposed to
introduce its own random number generator with the ability to fix the initial
conditions using the seed parameter. This allows to set the same initial neuronal
weights and a deterministic sequence of training samples selection, which eliminates
variability and ensures the stability of the clustering results.
3. The effectiveness of the proposed approach is tested. The results of
clustering the customer base of a telecommunications company have shown that
using the same value of the seed parameter ensures full reproducibility of Kohonen
maps. This allows for stable identification of customer groups, simplifying data
analysis for the development of targeted marketing strategies and personalized tariff
plans.
O.V. Ivashchenko, S.S. Fedin
ISSN 1681–6048 System Research & Information Technologies, 2026, № 1 122
REFERENCES
1. “Teuvo Kohonen, Timo Honkela, Kohonen Network,” Scholarpedia. 2007. Ac-
cessed on: 12 June 2024. Available: http://www.scholarpedia.org/article/Self-
organizing_feature_map
2. Panu Somervuo, Teuvo Kohonen, “Clustering and Visualization of Large Protein
Sequence Databases by Means of an Extension of the Self-Organizing Map,” Lecture
Notes in Computer Science, vol. 1967, 2000. doi: https://doi.org/10.1007/3-540-
44418-1_7
3. Marc M. Van Hulle, “Self-Organizing Maps,” Handbook of Natural Computing.
Springer, Berlin, Heidelberg, pp. 585–622, 2012.
4. Jens Christian Claussen, “Winner-Relaxing Self-Organizing Maps,” Neural Compu-
tation, 17(5), pp. 996–1009, 2005. doi: https://doi.org/10.1162/0899766053491922
5. Melody Y. Kiang, Michael Y. Hu, Dorothy M. Fisher, “An extended self-organizing
map network for market segmentation — a telecommunication example,” Decision
Support Systems, vol. 42, issue 1, October 2006, pp. 36–47. doi:
https://doi.org/10.1016/j.dss.2004.09.012
6. W. Wang, S. Xu, H. Ouyang, X. Zeng, “Parameter Optimization of the Power and
Energy System of Unmanned Electric Drive Chassis Based on Improved Genetic Al-
gorithms of the KOHONEN Network,” World Electric Vehicle Journal, 14(9), 260,
2023. doi: https://doi.org/10.3390/wevj14090260
7. V. Diachenko, O. Liashenko, B.F. Ibrahim, O. Mikhal, Yu. Koltun, “Kohonen net-
work with parallel training: Operation structure and algorithm,” Int. J. Adv. Trends
Comp. Sci. Eng., vol. 8, no. 1.2, pp. 35–38, 2019. doi: https://doi.org/
10.30534/ijatcse/2019/0681.22019
8. Rodrigo B. de C. Cavalcanti, Bruno Pimentel, Carlos W.D. de Almeida, Renata
M.C.R. de Souza, “A Multivariate Fuzzy Kohonen Clustering Network,” IEEE
Transactions on Neural Networks, vol. 31, no. 4. pp. 75–82, 2020. doi:
https://doi.org/10.1109/IJCNN.2019.8852243
9. N.I. Furmanova, O.Y. Farafonov, O.Y. Malyi, Y.O. Sitsilitsyn, “Improvement of the
method of searching for solutions to solve the optimization problem using a genetic algo-
rithm by preliminary clustering,” (in Ukrainian), Instrumentation Technology, no. 2, pp. 6–
9, 2017. Available: https://elar.tsatu.edu.ua/server/api/core/bitstreams/f045c4ca-7d17-4c9c-
a1cb-7b2df9aafe7e/content
10. V.O. Dyachenko, O.F. Mikhal, “Prospects for the use of of the classical Kohonen al-
gorithm in distributed energy-critical sensor networks,” (in Ukrainian), Control, Nav-
igation and Communication Systems, no. 4, pp. 75–79, 2023. doi:
https://doi.org/10.26906/SUNZ.2023.4.075
11. E.A. Egorova, V.G. Ivanov, E.S. Sakalo, “Optimization of process of the Kohonen
self-organizing map based on the Kalman-Mayne filter,” (in Russian), Control, Nav-
igation and Communication Systems, issue 4(8), pp. 52–55, 2008. Available:
https://dspace.nlu.edu.ua/bitstream/123456789/6713/1/Ivanov_52-55.pdf
12. Achraf Khazri, “Self-Organizing Maps (Kohonen’s maps),” Medium, [website]. 2019.
Available: https://medium.com/data-science/self-organizing-maps-1b7d2a84e065
13. T. Kohonen, “The self-organizing map,” Proceedings of the IEEE, 78(9), pp. 1464–
1480, 1990.
14. E.O. Kaminsky, Forecasting system for supporting activities using deep learning.
Bachelor’s thesis. NTUU “KPI”, Kyiv, 2023, p. 78.
15. Kohonen Networks, [website]. Accessed on: 13.11.2022. Available: https://ppt-
online.org/46514
16. T. Kohonen, Self-Organizing Maps. Springer-Verlag Berlin Heidelberg, 2001.
Improving the SOM algorithm to ensure stability and reproducibility of data clustering results
Системні дослідження та інформаційні технології, 2026, № 1 123
17. S. Kaski, “Self-Organizing Maps,” in C. Sammut, G.I. Webb (Eds.) Encyclopedia of
Machine Learning. Springer, Boston, 2011, pp. 886–888.
18. “Self-Organizing Maps, SOM,” Studfile.net. Accessed on: 15 Nov. 2022. Available:
https://studfile.net/preview/3021431/
Received 25.12.2024
INFORMATION ON THE ARTICLE
Oleksandr V. Ivashchenko, ORCID: 0009-0007-5470-9137, National Transport
University, Ukraine, e-mail: alexander.ivashchenkoo@gmail.com
Serhii S. Fedin, ORCID: 0000-0001-9732-632X, National Transport University,
Ukraine, e-mail: sergey.fedin1975@gmail.com
УДОСКОНАЛЕННЯ АЛГОРИТМУ SOM ДЛЯ ЗАБЕЗПЕЧЕННЯ
СТАБІЛЬНОСТІ ТА ВІДТВОРЮВАНОСТІ РЕЗУЛЬТАТІВ КЛАСТЕРИЗАЦІЇ
ДАНИХ / О.В. Іващенко, С.С. Федін
Анотація. Запропоновано метод удосконалення алгоритму навчання самоор-
ганізаційних карт Кохонена (Self-Organizing Maps, SOM) для забезпечення
стабільності та відтворюваності результатів кластеризації, що є актуальним за-
вданням у ході роботи з великими обсягами даних. SOM широко застосовуєть-
ся у задачах кластеризації та візуалізації, особливо у сферах, де необхідно ана-
лізувати багатовимірні структури даних, зокрема у білінгових системах
телекомунікаційних компаній, фінансовому аналізі тощо. Стандартна реаліза-
ція SOM, яка включає випадкову ініціалізацію ваг і стохастичний вибір зразків
під час навчання, призводить до значної варіативності кластерів навіть за умо-
ви використання однакових вхідних даних та ідентичних параметрів налашту-
вання тренування мережі. Це ускладнює застосування цього алгоритму
у випадках, коли потрібна стабільність та відтворюваність результатів. Для
вирішення цієї задачі запропоновано модифікацію алгоритму, що включає
власний генератор випадкових чисел і введення параметра seed для фіксації
початкових умов навчання. Це дає змогу знизити варіативність і забезпечити
відтворюваність результатів кластеризації для підвищення достовірності
аналізу та придатності алгоритму SOM за використання в реальних бізнес-
завданнях. Запропонований метод протестовано на даних білінгових систем,
де відтворюваність результатів кластеризації має критичне значення для ефек-
тивної роботи з клієнтськими сегментами, розроблення таргетованих марке-
тингових стратегій, персоналізованих тарифних планів тощо.
Ключові слова: самоорганізаційні карти Кохонена (SOM), кластеризація
даних, параметр seed, відтворюваність результатів, генератор випадкових
чисел.
|
| id | journaliasakpiua-article-358080 |
| institution | System research and information technologies |
| keywords_txt_mv | keywords |
| language | English |
| last_indexed | 2026-04-20T01:00:22Z |
| publishDate | 2026 |
| publisher | The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" |
| record_format | ojs |
| resource_txt_mv | journaliasakpiua/f5/280642f31a753543d52085058f22e4f5.pdf |
| spelling | journaliasakpiua-article-3580802026-04-19T21:53:19Z Improving the SOM algorithm to ensure stability and reproducibility of data clustering results Удосконалення алгоритму SOM для забезпечення стабільності та відтворюваності результатів кластеризації даних Ivashchenko, Oleksandr Fedin, Serhii самоорганізаційні карти Кохонена (SOM) кластеризація даних параметр seed відтворюваність результатів генератор випадкових чисел Kohonen self-organizing maps (SOM) data clustering seed parameter reproducibility of results random number generator The article proposes a method to improve the Kohonen Self-Organizing Map (SOM) learning algorithm to ensure the stability and reproducibility of clustering results, an urgent task when working with large amounts of data. SOM is widely used in clustering and visualization tasks, especially in applications that require analyzing multidimensional data structures, such as telecommunications billing systems and financial analysis. The standard SOM implementation, which includes random weight initialization and stochastic sample selection during training, leads to significant cluster variability even when using the same input data and identical network training parameters. This makes it difficult to apply this algorithm in cases where stability and reproducibility of results are required. To solve this problem, we propose modifying the algorithm to include its own random number generator and introducing a seed parameter to fix the initial training conditions. This reduces variability and ensures reproducible clustering results, thereby increasing the reliability of the analysis and the suitability of the SOM algorithm for real business tasks. The proposed method has been tested on data from billing systems, where the reproducibility of clustering results is critical for effective work with customer segments, the development of targeted marketing strategies, and the creation of personalized tariff plans. Запропоновано метод удосконалення алгоритму навчання самоорганізаційних карт Кохонена (Self-Organizing Maps, SOM) для забезпечення стабільності та відтворюваності результатів кластеризації, що є актуальним завданням у ході роботи з великими обсягами даних. SOM широко застосовується у задачах кластеризації та візуалізації, особливо у сферах, де необхідно аналізувати багатовимірні структури даних, зокрема у білінгових системах телекомунікаційних компаній, фінансовому аналізі тощо. Стандартна реалізація SOM, яка включає випадкову ініціалізацію ваг і стохастичний вибір зразків під час навчання, призводить до значної варіативності кластерів навіть за умови використання однакових вхідних даних та ідентичних параметрів налаштування тренування мережі. Це ускладнює застосування цього алгоритму у випадках, коли потрібна стабільність та відтворюваність результатів. Для вирішення цієї задачі за-пропоновано модифікацію алгоритму, що включає власний генератор випадкових чисел і введення параметра seed для фіксації початкових умов навчання. Це дає змогу знизити варіативність і забезпечити відтворюваність результатів кластеризації для підвищення достовірності аналізу та придат-ності алгоритму SOM за використання в реальних бізнес-завданнях. Запро-понований метод протестовано на даних білінгових систем, де відтво-рюваність результатів кластеризації має критичне значення для ефективної роботи з клієнтськими сегментами, розроблення таргетованих маркетинго-вих стратегій, персоналізованих тарифних планів тощо. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2026-03-31 Article Article Peer-reviewed Article application/pdf https://journal.iasa.kpi.ua/article/view/358080 10.20535/SRIT.2308-8893.2026.1.08 System research and information technologies; No. 1 (2026); 112-123 Системные исследования и информационные технологии; № 1 (2026); 112-123 Системні дослідження та інформаційні технології; № 1 (2026); 112-123 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/358080/344005 |
| spellingShingle | самоорганізаційні карти Кохонена (SOM) кластеризація даних параметр seed відтворюваність результатів генератор випадкових чисел Ivashchenko, Oleksandr Fedin, Serhii Удосконалення алгоритму SOM для забезпечення стабільності та відтворюваності результатів кластеризації даних |
| title | Удосконалення алгоритму SOM для забезпечення стабільності та відтворюваності результатів кластеризації даних |
| title_alt | Improving the SOM algorithm to ensure stability and reproducibility of data clustering results |
| title_full | Удосконалення алгоритму SOM для забезпечення стабільності та відтворюваності результатів кластеризації даних |
| title_fullStr | Удосконалення алгоритму SOM для забезпечення стабільності та відтворюваності результатів кластеризації даних |
| title_full_unstemmed | Удосконалення алгоритму SOM для забезпечення стабільності та відтворюваності результатів кластеризації даних |
| title_short | Удосконалення алгоритму SOM для забезпечення стабільності та відтворюваності результатів кластеризації даних |
| title_sort | удосконалення алгоритму som для забезпечення стабільності та відтворюваності результатів кластеризації даних |
| topic | самоорганізаційні карти Кохонена (SOM) кластеризація даних параметр seed відтворюваність результатів генератор випадкових чисел |
| topic_facet | самоорганізаційні карти Кохонена (SOM) кластеризація даних параметр seed відтворюваність результатів генератор випадкових чисел Kohonen self-organizing maps (SOM) data clustering seed parameter reproducibility of results random number generator |
| url | https://journal.iasa.kpi.ua/article/view/358080 |
| work_keys_str_mv | AT ivashchenkooleksandr improvingthesomalgorithmtoensurestabilityandreproducibilityofdataclusteringresults AT fedinserhii improvingthesomalgorithmtoensurestabilityandreproducibilityofdataclusteringresults AT ivashchenkooleksandr udoskonalennâalgoritmusomdlâzabezpečennâstabílʹnostítavídtvorûvanostírezulʹtatívklasterizacíídanih AT fedinserhii udoskonalennâalgoritmusomdlâzabezpečennâstabílʹnostítavídtvorûvanostírezulʹtatívklasterizacíídanih |