SPSS Statistics

SPSS Statistics

Your hub for statistical analysis, data management, and data documentation. Connect, learn, and share with your peers! 

 View Only
Expand all | Collapse all

TWOSTEP CLUSTER (aka DMCLUSTER) returns different results depending on sort

  • 1.  TWOSTEP CLUSTER (aka DMCLUSTER) returns different results depending on sort

    Posted Thu June 15, 2023 10:55 AM
    Edited by Soren V. Raben Fri June 16, 2023 01:57 PM

    I had to segment some data, so I ran TWOSTEP CLUSTER command around 2 months ago and saved results. Today I opened this file again, accidentally sorted it, ran TWOSTEP CLUSTER again and got much different frequencies for each segment. Syntax was essentially the same (both in terms of selected variables and fixed number of clusters):

    twostep cluster
    /categorical variables = x1 y10 z7 s8 u9 w1 k0
    /numclusters fixed 5
    /viewmodel display = yes
    /save variable = SEG.

    Only when I closed dataset without saving, re-opened it and ran abovementioned syntax again I got the same results as before. Is this a bug or do I miss something (e.g. subcommand, set seed setting)? I work on SPSS 29.0.0.0 (241) version however my colleague has the same issue with version 27 or 28 (we tested it on 2 different devices).

    Any help would be highly appreciated.



  • 2.  RE: TWOSTEP CLUSTER (aka DMCLUSTER) returns different results depending on sort

    Posted Thu June 15, 2023 12:57 PM
    I think TWOSTEP uses the built-in random number generator, so you could set that just before running it, but note this comment from the Algorithms manual

    "the structure of the constructed CF tree may depend on the input order of the cases or
    records. To minimize the order effect, randomly order the records before building the model."

    You might want to use the INFILE subcommand.
    The INFILE subcommand causes TWOSTEP CLUSTER to update a cluster model whose CF Tree has been saved as an XML file with the OUTFILE subcommand and STATE keyword. The model will be updated with the data in the active file.

    --