SPSS Statistics

 View Only
  • 1.  K-means initial centroids

    Posted Mon September 16, 2024 10:05 AM

    I am writing the second edition of my statistics textbook which features SPSS and is published by Routledge. I have added a cluster analysis chapter and have a question. In SPSS, the k-means algorithm uses mostly random initialization for the selection of initial cluster centroids. Although it is mostly random, it does incorporate some internal methods to ensure that the clustering algorithm is effective. The SPSS documentation and user guides do not specifically mention the use of additional methods to refine or guide the random selection process in the standard implementation. Can you provide me with specific information about the deterministic component of initial centroid selection? Please apprise me of any additional information that you require. Thanks a bunch! Ed



    ------------------------------
    Dr. Edward Vieira
    ------------------------------


  • 2.  RE: K-means initial centroids

    Posted Tue September 17, 2024 10:27 AM

    Hi Dr. Edward Vieira,

    SPSS uses the Maximin distance algorithm for centroid initialization, which helps in placing centroids at maximum distances from each other to improve clustering results.



    ------------------------------
    Aruna Saraswathy
    Statistician
    SPSS Statistics
    IBM
    ------------------------------



  • 3.  RE: K-means initial centroids

    Posted Wed September 18, 2024 08:11 AM

    Thanks a bunch, Aruna!



    ------------------------------
    Dr. Edward Vieira
    ------------------------------



  • 4.  RE: K-means initial centroids

    Posted Wed September 18, 2024 06:42 AM
    Edited by Kirill Orlov Wed September 18, 2024 07:11 AM

    Spss QUICK CLUSTER (k-means) procedure uses, in automatic mode, the farthest-points-running-selection algorithm to produce initial cluster centres. It is what Aruna referred to as maximal distance algorithm. (and the algo - which is deterministic, although may be sensitive to the case order in the data - is described in the "SPSS Statistics Algorithms" doc).

    My macro !KO_KMINI offers, in addition to it, 6 more methods to initialize initial cluster centres for k-means. The macro is on my page "Kirill's SPSS macros" in the collection "Clustering".



    ------------------------------
    Kirill Orlov
    ------------------------------



  • 5.  RE: K-means initial centroids

    Posted Wed September 18, 2024 08:10 AM

    Thank you, Kirill.

    Ed



    ------------------------------
    Dr. Edward Vieira
    ------------------------------



  • 6.  RE: K-means initial centroids

    Posted Wed September 18, 2024 09:20 AM
    There are six clustering procedures available in SPSS: four built in and two extensions.  And then Kirill's macros.  It would be interesting to see a comparative analysis of all of these.

    --





  • 7.  RE: K-means initial centroids

    Posted Thu September 19, 2024 06:18 AM

    Thank you, Jon. I appreciate it. Ed



    ------------------------------
    Dr. Edward Vieira
    ------------------------------