SPSS Statistics

 View Only

 Replicate K means clusters in Python

wen mart's profile image
wen mart posted Wed May 21, 2025 01:34 PM

Hi everyone,

I'm trying to replicate the results of a K-Means clustering I previously ran in SPSS, using the exact same dataset in Python (with scikit-learn). However, I'm getting different results and cluster assignments, even though the input data hasn't changed.

I understand that initial centroids can be randomly selected, which could cause variation—but I’d like to know if possible the specific parameters or initialisation method SPSS uses for K-Means (e.g. initialisation strategy, number of iterations).

My goal is to match SPSS results as closely as possible in Python by adjusting the parameters accordingly.

Any insights or suggestions would be greatly appreciated!


Gunilla Rudander's profile image
Gunilla Rudander

Interesting question. As far as I know the initial centroids is randomly selected and I cannot find any place in SPSS Statistics to set the random seed (it's just a number) to a special value  (that should be possible in SPSS Modeler, I think) and then use the same random seed in both SPSS and Python.  The max iteration is set to 10 as default in SPSS but can be changed. See the syntax:

QUICK CLUSTER fac1_2 fac2_2 fac3_2 fac4_2
  /MISSING=LISTWISE
  /CRITERIA=CLUSTER(3) MXITER(10) CONVERGE(0)
  /METHOD=KMEANS(NOUPDATE)
  /PRINT INITIAL.

Jon Peck's profile image
Jon Peck IBM Champion

I expect that SET SEED in SPSS would control it, but there is no reason to think that the generated random numbers would be the same with a different generator in Python.