There are six clustering procedures available in SPSS: four built in and two extensions. And then Kirill's macros. It would be interesting to see a comparative analysis of all of these.
--
Original Message:
Sent: 9/18/2024 6:42:00 AM
From: Kirill Orlov
Subject: RE: K-means initial centroids
Spss QUICK CLUSTER (k-means) procedure uses, in automatic mode, the farthest-points-running-selection algorithm to produce initial cluster centres. It is what Aruna referred to as maximal distance algorithm. (and the algo - which is deterministic, although may be sensitive to the case order in the data - is described in the "SPSS Statistics Algorithms" doc).
My macro !KO_KMINI offers, in addition to it, 6 more methods to initialize initial cluster centres for k-means. The macro is on my page "Kirill's SPSS macros" in the collection "Clustering".
------------------------------
Kirill Orlov
------------------------------
Original Message:
Sent: Thu September 12, 2024 12:01 PM
From: Dr. Edward Vieira
Subject: K-means initial centroids
I am writing the second edition of my statistics textbook which features SPSS and is published by Routledge. I have added a cluster analysis chapter and have a question. In SPSS, the k-means algorithm uses mostly random initialization for the selection of initial cluster centroids. Although it is mostly random, it does incorporate some internal methods to ensure that the clustering algorithm is effective. The SPSS documentation and user guides do not specifically mention the use of additional methods to refine or guide the random selection process in the standard implementation. Can you provide me with specific information about the deterministic component of initial centroid selection? Please apprise me of any additional information that you require. Thanks a bunch! Ed
------------------------------
Dr. Edward Vieira
------------------------------