Your hub for statistical analysis, data management, and data documentation. Connect, learn, and share with your peers!
When I create clusters with "K Means" using the same clustering variables in two different sets of identical data I get different cluster sizes. My syntax is the same. I am not able to reproduce my results across identical data sets.
K-Means is sensitive to the starting values for the cluster centers and even to the order of the cases. It is often suggested to run it with different starting values to find stable clusters. If you save the cluster centers from a run, you can start with those for a subsequent run or make up your own initial centers in a dataset.
Thanks for your response. I reordered my data in various ways and found that the clusters differed, some times significantly.
You suggested using different starting values to find stable clusters. What defines a stable cluster?
Cluster analysis is an ad hoc sort of method, so there is no definite rule about what is best in terms of centers and number of clusters, but here are some possibilities.