The Clustering Method Based on the Consequential Running of k-Means with Calculation of the Distances to the Active Centroids

A variant of the clustering problem solution based on k-means algorithm is considered. This algorithm is widely used in many fields of science and technology. The main drawbacks of k-means algorithm are the clustering results dependence on the choice of the initial configuration of centroids (initia...

Повний опис

Збережено в:
Бібліографічні деталі
Дата:2012
Автори: Ткаченко, О. М., Біліченко, Н. О., Грійо Тукало, О. Ф., Дзісь, О. В.
Формат: Стаття
Мова:Ukrainian
Опубліковано: Інститут проблем реєстрації інформації НАН України 2012
Теми:
Онлайн доступ:http://drsp.ipri.kiev.ua/article/view/311801
Теги: Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
Назва журналу:Data Recording, Storage & Processing

Репозитарії

Data Recording, Storage & Processing
Опис
Резюме:A variant of the clustering problem solution based on k-means algorithm is considered. This algorithm is widely used in many fields of science and technology. The main drawbacks of k-means algorithm are the clustering results dependence on the choice of the initial configuration of centroids (initialization) and convergence to local minimum of the objective function. The proposed improved k-means provides а solution close to the global minimum distortion by the sequential k-means running for 1, 2,..., k centroids. A significant speed-up of operation is achieved by calculating the distances only to the active centroids and reducing the number of candidate vectors for the initial choice of the new cen- troid location. The advantage of this approach is more appreciable when a larger data set with higher dimension is used. The proposed algorithm should be used in the speech data clustering problems when creating code books.