Робастна кластеризація великих даних методом стохастичного квантування

This paper addresses the limitations of traditional vector quantization (clustering) algorithms, particularly K-means and its variant K-means++, and explores the stochastic quantization (SQ) algorithm as a scalable alternative for high-dimensional unsupervised and semi-supervised learning problems....

Повний опис

Збережено в:
Бібліографічні деталі
Дата:2025
Автори: Kozyriev, Anton, Norkin, Vladimir
Формат: Стаття
Мова:English
Опубліковано: V.M. Glushkov Institute of Cybernetics of NAS of Ukraine 2025
Теми:
Онлайн доступ:https://jais.net.ua/index.php/files/article/view/438
Теги: Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
Назва журналу:Problems of Control and Informatics

Репозитарії

Problems of Control and Informatics
Опис
Резюме:This paper addresses the limitations of traditional vector quantization (clustering) algorithms, particularly K-means and its variant K-means++, and explores the stochastic quantization (SQ) algorithm as a scalable alternative for high-dimensional unsupervised and semi-supervised learning problems. Some traditional clustering algorithms suffer from inefficient memory utilization during computation, necessitating the loading of all data samples into memory, which becomes impractical for large-scale datasets. While variants such as mini-batch K-means partially mitigate this issue by reducing memory usage, they lack robust theoretical convergence guarantees due to the non-convex nature of clustering problems. In contrast, SQ-algorithm provides strong theoretical convergence guarantees, making it a robust alternative for clustering tasks. We demonstrate the computational efficiency and rapid convergence of the algorithm on an image classification problem with partially labeled data. To address the challenge of high dimensionality, we trained Triplet Network to encode images into low-dimensional representations in a latent space, which serve as a basis for comparing the efficiency of both SQ-algorithm and traditional quantization algorithm.