The Clustering Method Based on the Consequential Running of k-Means with Calculation of the Distances to the Active Centroids

A variant of the clustering problem solution based on k-means algorithm is considered. This algorithm is widely used in many fields of science and technology. The main drawbacks of k-means algorithm are the clustering results dependence on the choice of the initial configuration of centroids (initia...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Datum:2012
Hauptverfasser: Ткаченко, О. М., Біліченко, Н. О., Грійо Тукало, О. Ф., Дзісь, О. В.
Format: Artikel
Sprache:Ukrainian
Veröffentlicht: Інститут проблем реєстрації інформації НАН України 2012
Schlagworte:
Online Zugang:http://drsp.ipri.kiev.ua/article/view/311801
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:Data Recording, Storage & Processing

Institution

Data Recording, Storage & Processing
Beschreibung
Zusammenfassung:A variant of the clustering problem solution based on k-means algorithm is considered. This algorithm is widely used in many fields of science and technology. The main drawbacks of k-means algorithm are the clustering results dependence on the choice of the initial configuration of centroids (initialization) and convergence to local minimum of the objective function. The proposed improved k-means provides а solution close to the global minimum distortion by the sequential k-means running for 1, 2,..., k centroids. A significant speed-up of operation is achieved by calculating the distances only to the active centroids and reducing the number of candidate vectors for the initial choice of the new cen- troid location. The advantage of this approach is more appreciable when a larger data set with higher dimension is used. The proposed algorithm should be used in the speech data clustering problems when creating code books.