The Clustering Method Based on the Consequential Running of k-Means with Calculation of the Distances to the Active Centroids

A variant of the clustering problem solution based on k-means algorithm is considered. This algorithm is widely used in many fields of science and technology. The main drawbacks of k-means algorithm are the clustering results dependence on the choice of the initial configuration of centroids (initia...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Datum:	2012
Hauptverfasser:	Ткаченко, О. М., Біліченко, Н. О., Грійо Тукало, О. Ф., Дзісь, О. В.
Format:	Artikel
Sprache:	Ukrainisch
Veröffentlicht:	Інститут проблем реєстрації інформації НАН України 2012
Schlagworte:	code books clustering k-means centroids kd-trees
Online Zugang:	https://drsp.ipri.kiev.ua/article/view/311801
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:	Data Recording, Storage & Processing

Institution

Data Recording, Storage & Processing

_version_	1866301575096434688
author	Ткаченко, О. М. Біліченко, Н. О. Грійо Тукало, О. Ф. Дзісь, О. В.
author_facet	Ткаченко, О. М. Біліченко, Н. О. Грійо Тукало, О. Ф. Дзісь, О. В.
author_sort	Ткаченко, О. М.
baseUrl_str	http://drsp.ipri.kiev.ua/oai
collection	OJS
datestamp_date	2024-09-22T23:29:39Z
description	A variant of the clustering problem solution based on k-means algorithm is considered. This algorithm is widely used in many fields of science and technology. The main drawbacks of k-means algorithm are the clustering results dependence on the choice of the initial configuration of centroids (initialization) and convergence to local minimum of the objective function. The proposed improved k-means provides а solution close to the global minimum distortion by the sequential k-means running for 1, 2,..., k centroids. A significant speed-up of operation is achieved by calculating the distances only to the active centroids and reducing the number of candidate vectors for the initial choice of the new cen- troid location. The advantage of this approach is more appreciable when a larger data set with higher dimension is used. The proposed algorithm should be used in the speech data clustering problems when creating code books.
doi_str_mv	10.35681/1560-9189.2012.14.1.311801
first_indexed	2025-07-17T10:59:03Z
format	Article
id	drspiprikievua-article-311801
institution	Data Recording, Storage & Processing
keywords_txt_mv	keywords
language	Ukrainian
last_indexed	2025-07-17T10:59:03Z
publishDate	2012
publisher	Інститут проблем реєстрації інформації НАН України
record_format	ojs
spelling	drspiprikievua-article-3118012024-09-22T23:29:39Z The Clustering Method Based on the Consequential Running of k-Means with Calculation of the Distances to the Active Centroids Метод кластеризации на основе последовательного запуска k-средних с вычислением расстояний до активных центроидов Метод кластеризації на основі послідовного запуску k-середніх з обчисленням відстаней до активних центроїдів Ткаченко, О. М. Біліченко, Н. О. Грійо Тукало, О. Ф. Дзісь, О. В. кодовые книги кластеризация k-средних центроиды kd-деревья code books clustering k-means centroids kd-trees кодові книги кластеризація k-середніх центроїди kd-дерева A variant of the clustering problem solution based on k-means algorithm is considered. This algorithm is widely used in many fields of science and technology. The main drawbacks of k-means algorithm are the clustering results dependence on the choice of the initial configuration of centroids (initialization) and convergence to local minimum of the objective function. The proposed improved k-means provides а solution close to the global minimum distortion by the sequential k-means running for 1, 2,..., k centroids. A significant speed-up of operation is achieved by calculating the distances only to the active centroids and reducing the number of candidate vectors for the initial choice of the new cen- troid location. The advantage of this approach is more appreciable when a larger data set with higher dimension is used. The proposed algorithm should be used in the speech data clustering problems when creating code books. Рассмотрен один из вариантов решения задачи кластеризации на основе алгоритма k-средних, который широко применяется во многих областях науки и техники. Главными недостатками алгоритма k-средних являются зависимость результатов кластеризации от выбора начальной конфигурации центроидов (инициализации) и сходимость к локальному минимуму целевой функции. Предложенный в работе усовершенствованный метод k-средних позволяет получить решение, приближенное к глобальному минимуму искажения путем последовательного запуска k-средних для 1, 2,..., k центроидов. Значительное ускорение работь достигается за счет вычисления расстояний только к активным центроидам, а также уменьшения количества векторов-кандидатов на выбор места первоначального расположения нового центроида. Преимущество данного подхода существенно возрастает при больших объёмах данных и с увеличением размерности. Предложенный алгоритм целесообразно использовать в задачах кластеризации речевых данных при создании кодовых книг. Розглянуто один із варіантів розв’язку задачі кластеризації на основі алгоритму k-середніх, який широко застосовується в багатьох сферах науки і техніки. Головними недоліками алгоритму k-середніх є залежність результатів кластеризації від вибору початкової конфігурації центроїдів (ініціалізації) та збіжність до локального мінімуму цільової функції. Запропонований в роботі вдосконалений метод k-середніх дозволяє отримати розв’язок, наближений до глобального мінімуму спотворення шляхом послідовного запуску k-середніх для 1,2,...,k центроїдів. Значне прискорення роботи досягається за рахунок обчислення відстаней лише до активних центроїдів, а також зменшення кількості векторів-кандидатів на вибір місця початкового розташування нового центроїду. Перевага даного підходу суттєво зростає за великих обсягів даних і зі збільшенням розмірності. Запропонований алгоритм доцільно використовувати в задачах кластеризації мовленнєвих даних при створенні кодових книг. Інститут проблем реєстрації інформації НАН України 2012-03-20 Article Article application/pdf https://drsp.ipri.kiev.ua/article/view/311801 10.35681/1560-9189.2012.14.1.311801 Data Recording, Storage & Processing; Vol. 14 No. 1 (2012); 25-34 Регистрация, хранение и обработка данных; Том 14 № 1 (2012); 25-34 Реєстрація, зберігання і обробка даних; Том 14 № 1 (2012); 25-34 1560-9189 uk https://drsp.ipri.kiev.ua/article/view/311801/302956
spellingShingle	code books clustering k-means centroids kd-trees Ткаченко, О. М. Біліченко, Н. О. Грійо Тукало, О. Ф. Дзісь, О. В. The Clustering Method Based on the Consequential Running of k-Means with Calculation of the Distances to the Active Centroids
title	The Clustering Method Based on the Consequential Running of k-Means with Calculation of the Distances to the Active Centroids
title_alt	Метод кластеризации на основе последовательного запуска k-средних с вычислением расстояний до активных центроидов Метод кластеризації на основі послідовного запуску k-середніх з обчисленням відстаней до активних центроїдів
title_full	The Clustering Method Based on the Consequential Running of k-Means with Calculation of the Distances to the Active Centroids
title_fullStr	The Clustering Method Based on the Consequential Running of k-Means with Calculation of the Distances to the Active Centroids
title_full_unstemmed	The Clustering Method Based on the Consequential Running of k-Means with Calculation of the Distances to the Active Centroids
title_short	The Clustering Method Based on the Consequential Running of k-Means with Calculation of the Distances to the Active Centroids
title_sort	clustering method based on the consequential running of k-means with calculation of the distances to the active centroids
topic	code books clustering k-means centroids kd-trees
topic_facet	кодовые книги кластеризация k-средних центроиды kd-деревья code books clustering k-means centroids kd-trees кодові книги кластеризація k-середніх центроїди kd-дерева
url	https://drsp.ipri.kiev.ua/article/view/311801
work_keys_str_mv	AT tkačenkoom theclusteringmethodbasedontheconsequentialrunningofkmeanswithcalculationofthedistancestotheactivecentroids AT bílíčenkono theclusteringmethodbasedontheconsequentialrunningofkmeanswithcalculationofthedistancestotheactivecentroids AT gríjotukaloof theclusteringmethodbasedontheconsequentialrunningofkmeanswithcalculationofthedistancestotheactivecentroids AT dzísʹov theclusteringmethodbasedontheconsequentialrunningofkmeanswithcalculationofthedistancestotheactivecentroids AT tkačenkoom metodklasterizaciinaosnoveposledovatelʹnogozapuskaksrednihsvyčisleniemrasstoânijdoaktivnyhcentroidov AT bílíčenkono metodklasterizaciinaosnoveposledovatelʹnogozapuskaksrednihsvyčisleniemrasstoânijdoaktivnyhcentroidov AT gríjotukaloof metodklasterizaciinaosnoveposledovatelʹnogozapuskaksrednihsvyčisleniemrasstoânijdoaktivnyhcentroidov AT dzísʹov metodklasterizaciinaosnoveposledovatelʹnogozapuskaksrednihsvyčisleniemrasstoânijdoaktivnyhcentroidov AT tkačenkoom metodklasterizacíínaosnovíposlídovnogozapuskukseredníhzobčislennâmvídstanejdoaktivnihcentroídív AT bílíčenkono metodklasterizacíínaosnovíposlídovnogozapuskukseredníhzobčislennâmvídstanejdoaktivnihcentroídív AT gríjotukaloof metodklasterizacíínaosnovíposlídovnogozapuskukseredníhzobčislennâmvídstanejdoaktivnihcentroídív AT dzísʹov metodklasterizacíínaosnovíposlídovnogozapuskukseredníhzobčislennâmvídstanejdoaktivnihcentroídív AT tkačenkoom clusteringmethodbasedontheconsequentialrunningofkmeanswithcalculationofthedistancestotheactivecentroids AT bílíčenkono clusteringmethodbasedontheconsequentialrunningofkmeanswithcalculationofthedistancestotheactivecentroids AT gríjotukaloof clusteringmethodbasedontheconsequentialrunningofkmeanswithcalculationofthedistancestotheactivecentroids AT dzísʹov clusteringmethodbasedontheconsequentialrunningofkmeanswithcalculationofthedistancestotheactivecentroids

The Clustering Method Based on the Consequential Running of k-Means with Calculation of the Distances to the Active Centroids

Institution

Ähnliche Einträge