The Clustering Method Based on the Consequential Running of k-Means with Calculation of the Distances to the Active Centroids
A variant of the clustering problem solution based on k-means algorithm is considered. This algorithm is widely used in many fields of science and technology. The main drawbacks of k-means algorithm are the clustering results dependence on the choice of the initial configuration of centroids (initia...
Saved in:
| Date: | 2012 |
|---|---|
| Main Authors: | , , , |
| Format: | Article |
| Language: | Ukrainian |
| Published: |
Інститут проблем реєстрації інформації НАН України
2012
|
| Subjects: | |
| Online Access: | http://drsp.ipri.kiev.ua/article/view/311801 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Journal Title: | Data Recording, Storage & Processing |
Institution
Data Recording, Storage & Processing| Summary: | A variant of the clustering problem solution based on k-means algorithm is considered. This algorithm is widely used in many fields of science and technology. The main drawbacks of k-means algorithm are the clustering results dependence on the choice of the initial configuration of centroids (initialization) and convergence to local minimum of the objective function. The proposed improved k-means provides а solution close to the global minimum distortion by the sequential k-means running for 1, 2,..., k centroids. A significant speed-up of operation is achieved by calculating the distances only to the active centroids and reducing the number of candidate vectors for the initial choice of the new cen- troid location. The advantage of this approach is more appreciable when a larger data set with higher dimension is used. The proposed algorithm should be used in the speech data clustering problems when creating code books. |
|---|