Робастна кластеризація великих даних методом стохастичного квантування

This paper addresses the limitations of traditional vector quantization (clustering) algorithms, particularly K-means and its variant K-means++, and explores the stochastic quantization (SQ) algorithm as a scalable alternative for high-dimensional unsupervised and semi-supervised learning problems....

Повний опис

Збережено в:

Бібліографічні деталі
Дата:	2025
Автори:	Kozyriev, Anton, Norkin, Vladimir
Формат:	Стаття
Мова:	English
Опубліковано:	V.M. Glushkov Institute of Cybernetics of NAS of Ukraine 2025
Теми:	стохастичне квантування алгоритм кластеризації алгоритм K-середніх стохастичний градієнтний спуск неопукла оптимізація
Онлайн доступ:	https://jais.net.ua/index.php/files/article/view/438
Теги:	Додати тег Немає тегів, Будьте першим, хто поставить тег для цього запису!
Назва журналу:	Problems of Control and Informatics

Репозитарії

Problems of Control and Informatics

id	oai:ojs2.jais.net.ua:article-438
record_format	ojs
spelling	oai:ojs2.jais.net.ua:article-4382025-03-11T14:59:41Z Robust clustering on high-dimensional data with stochastic quantization Робастна кластеризація великих даних методом стохастичного квантування Kozyriev, Anton Norkin, Vladimir stochastic quantization clustering algorithms K-means stochastic gradient descent non-convex optimization стохастичне квантування алгоритм кластеризації алгоритм K-середніх стохастичний градієнтний спуск неопукла оптимізація This paper addresses the limitations of traditional vector quantization (clustering) algorithms, particularly K-means and its variant K-means++, and explores the stochastic quantization (SQ) algorithm as a scalable alternative for high-dimensional unsupervised and semi-supervised learning problems. Some traditional clustering algorithms suffer from inefficient memory utilization during computation, necessitating the loading of all data samples into memory, which becomes impractical for large-scale datasets. While variants such as mini-batch K-means partially mitigate this issue by reducing memory usage, they lack robust theoretical convergence guarantees due to the non-convex nature of clustering problems. In contrast, SQ-algorithm provides strong theoretical convergence guarantees, making it a robust alternative for clustering tasks. We demonstrate the computational efficiency and rapid convergence of the algorithm on an image classification problem with partially labeled data. To address the challenge of high dimensionality, we trained Triplet Network to encode images into low-dimensional representations in a latent space, which serve as a basis for comparing the efficiency of both SQ-algorithm and traditional quantization algorithm. Розглядаються обмеження традиційних алгоритмів векторного квантування (кластеризації), зокрема K-means і його варіант K-means++, та досліджується алгоритм стохастичного квантування (SQ) як масштабована альтернатива багатовимірним неконтрольованим і напівконтрольованим проблемам навчання. Для деяких традиційних алгоритмів кластеризації властиве неефективне використання пам’яті під час обчислень, що призводить до завантаження всіх зразків даних і непрактичності на великомасштабних наборах даних. Хоч за допомогою алгоритмів mini-batch K-means частково можна уникати цієї проблеми, зменшуючи використання пам’яті, їм бракує надійних теоретичних гарантій збіжності через неопуклу природу проблем кластеризації. Навпаки, алгоритм стохастичного квантування забезпечує сильні теоретичні гарантії збіжності, що робить його надійною альтернативою задачам кластеризації. Демонструється обчислювальна ефективність і швидка збіжність алгоритму в задачі класифікації зображення з частково позначеними даними. Щоб вирішити проблему високої розмірності, Triplet Network навчено кодувати зображення в низькорозмірні представлення в латентному просторі, які служать основою для порівняння ефективності як алгоритму стохастичного квантування, так і традиційних алгоритмів квантування. V.M. Glushkov Institute of Cybernetics of NAS of Ukraine 2025-02-25 Article Article application/pdf https://jais.net.ua/index.php/files/article/view/438 10.34229/1028-0979-2025-1-3 Міжнародний науково-технічний журнал "Проблеми керування та інформатики"; Том 70 № 1 (2025): Міжнародний науково-технічний журнал "Проблеми керування та інформатики"; 32–48 International Scientific Technical Journal "Problems of Control and Informatics; Том 70 № 1 (2025): International Scientific Technical Journal «Problems of Control and Informatics»; 32–48 International Scientific Technical Journal "Problems of Control and Informatics"; Vol. 70 No. 1 (2025): International Scientific Technical Journal «Problems of Control and Informatics»; 32–48 2786-6505 2786-6491 en https://jais.net.ua/index.php/files/article/view/438/501 Copyright (c) 2025 Anton Kozyriev, Vladimir Norkin https://creativecommons.org/licenses/by-nc-nd/4.0
institution	Problems of Control and Informatics
baseUrl_str
datestamp_date	2025-03-11T14:59:41Z
collection	OJS
language	English
topic	стохастичне квантування алгоритм кластеризації алгоритм K-середніх стохастичний градієнтний спуск неопукла оптимізація
spellingShingle	стохастичне квантування алгоритм кластеризації алгоритм K-середніх стохастичний градієнтний спуск неопукла оптимізація Kozyriev, Anton Norkin, Vladimir Робастна кластеризація великих даних методом стохастичного квантування
topic_facet	stochastic quantization clustering algorithms K-means stochastic gradient descent non-convex optimization стохастичне квантування алгоритм кластеризації алгоритм K-середніх стохастичний градієнтний спуск неопукла оптимізація
format	Article
author	Kozyriev, Anton Norkin, Vladimir
author_facet	Kozyriev, Anton Norkin, Vladimir
author_sort	Kozyriev, Anton
title	Робастна кластеризація великих даних методом стохастичного квантування
title_short	Робастна кластеризація великих даних методом стохастичного квантування
title_full	Робастна кластеризація великих даних методом стохастичного квантування
title_fullStr	Робастна кластеризація великих даних методом стохастичного квантування
title_full_unstemmed	Робастна кластеризація великих даних методом стохастичного квантування
title_sort	робастна кластеризація великих даних методом стохастичного квантування
title_alt	Robust clustering on high-dimensional data with stochastic quantization
description	This paper addresses the limitations of traditional vector quantization (clustering) algorithms, particularly K-means and its variant K-means++, and explores the stochastic quantization (SQ) algorithm as a scalable alternative for high-dimensional unsupervised and semi-supervised learning problems. Some traditional clustering algorithms suffer from inefficient memory utilization during computation, necessitating the loading of all data samples into memory, which becomes impractical for large-scale datasets. While variants such as mini-batch K-means partially mitigate this issue by reducing memory usage, they lack robust theoretical convergence guarantees due to the non-convex nature of clustering problems. In contrast, SQ-algorithm provides strong theoretical convergence guarantees, making it a robust alternative for clustering tasks. We demonstrate the computational efficiency and rapid convergence of the algorithm on an image classification problem with partially labeled data. To address the challenge of high dimensionality, we trained Triplet Network to encode images into low-dimensional representations in a latent space, which serve as a basis for comparing the efficiency of both SQ-algorithm and traditional quantization algorithm.
publisher	V.M. Glushkov Institute of Cybernetics of NAS of Ukraine
publishDate	2025
url	https://jais.net.ua/index.php/files/article/view/438
work_keys_str_mv	AT kozyrievanton robustclusteringonhighdimensionaldatawithstochasticquantization AT norkinvladimir robustclusteringonhighdimensionaldatawithstochasticquantization AT kozyrievanton robastnaklasterizacíâvelikihdanihmetodomstohastičnogokvantuvannâ AT norkinvladimir robastnaklasterizacíâvelikihdanihmetodomstohastičnogokvantuvannâ
first_indexed	2025-10-30T02:49:11Z
last_indexed	2025-10-30T02:49:11Z
_version_	1847373386174431232

Робастна кластеризація великих даних методом стохастичного квантування

Репозитарії

Схожі ресурси