SPSS Statistics

View Only

Replicate K means clusters in Python

wen mart posted Wed May 21, 2025 01:34 PM

Hi everyone,

I'm trying to replicate the results of a K-Means clustering I previously ran in SPSS, using the exact same dataset in Python (with scikit-learn). However, I'm getting different results and cluster assignments, even though the input data hasn't changed.

I understand that initial centroids can be randomly selected, which could cause variation—but I’d like to know if possible the specific parameters or initialisation method SPSS uses for K-Means (e.g. initialisation strategy, number of iterations).

My goal is to match SPSS results as closely as possible in Python by adjusting the parameters accordingly.

Any insights or suggestions would be greatly appreciated!

Gunilla Rudander posted Thu May 22, 2025 02:32 AM

Interesting question. As far as I know the initial centroids is randomly selected and I cannot find any place in SPSS Statistics to set the random seed (it's just a number) to a special value (that should be possible in SPSS Modeler, I think) and then use the same random seed in both SPSS and Python. The max iteration is set to 10 as default in SPSS but can be changed. See the syntax:

QUICK CLUSTER fac1_2 fac2_2 fac3_2 fac4_2
/MISSING=LISTWISE
/CRITERIA=CLUSTER(3) MXITER(10) CONVERGE(0)
/METHOD=KMEANS(NOUPDATE)
/PRINT INITIAL.

Jon Peck IBM Champion posted Thu May 22, 2025 12:55 PM

I expect that SET SEED in SPSS would control it, but there is no reason to think that the generated random numbers would be the same with a different generator in Python.

SPSS Statistics

Replicate K means clusters in Python

Additional
Resources

Office

Quick Links

SPSS Statistics

Replicate K means clusters in Python

Related Content

Using the Google Places API in Python

K-means Clustering: "Not enough cases to perform cluster analysis"

Python Programs for Non-Python People

Treemaps in SPSS

Randomness in ranking officers

Additional Resources

Office

Quick Links

Additional
Resources