SPSS Statistics

 View Only
  • 1.  Raking - Is there a category limit?

    Posted Wed November 16, 2022 08:16 PM
    I am doing raking across several variables. When I use 35 categories everything runs fine (and I can go up to the limit of ten variables). When I go to 36 the resulting weights are almost exactly the same (even when using 5 variables). If I take one of the variables and collapse two categories to get the total number down to 35 - it works fine. Is there a limit to categories?

    FYI - on a Mac with 64GB of memory and on a 64 bit only operating system. Thanks!

    ------------------------------
    David Winston
    ------------------------------

    #SPSSStatistics


  • 2.  RE: Raking - Is there a category limit?

    IBM Champion
    Posted Wed November 16, 2022 09:34 PM
    There is a limit of ten dimensions, but there is no limit on categories.  However if the product of categories across the variables is large, there may be a lot of empty cells or cells with very small counts, so the results may become unuseful.  I have had a client who had 1500 categories, though, and got satisfactory results..

    With a lot of categories, it is easier to set them up as a dataset rather than using the dialog box.  Here is an example of the syntax you might use for that.
    SPSSINC RAKE FINALWEIGHT=WEIGHT /DS1 DS=jobcatds CATVAR=jobcat TOTVAR=totals /DS2 DS=minorityds CATVAR=minority TOTVAR=totals
    Each dataset would have two columns giving the categories and counts or proportions.
    The datasets might look like this.


    I would recommend looking at the histogram of weights to see what the weighting will do.
    data list list /jobcat value.
    begin data
    1 .5
    2 .30
    3 .20
    end data.
    dataset name jobcatds.
    data list list/minority value.
    begin data
    0 .80
    1 .20
    end data.
    dataset name minorityds.

    --





  • 3.  RE: Raking - Is there a category limit?

    Posted Thu November 17, 2022 10:37 AM
    I have been using on a regular monthly generated dataset 5 dimensions with 33 categories. I have never had a problem. 

    This month added a sixth dimension with 3 categories and it generated a weight for each case of .909 (with there being some variation after the .909). It ran without crashing and there was no error in the log - but the weight was clearly incorrect. I did it several times with the same result and the weight being the same as the time before (.909).

    I then went back and collapsed two categories in one of the variables (getting the overall number from 36 (33+3) to 35 (32+3)) and generated a weight with six dimensions with 35 categories and everything ran fine. 

    Also when I rake - I generate the histogram to review and for the x-axis the distribution (for the 36 categories) was what I would expect but every value on the x-axis was .91.

    ------------------------------
    David Winston
    ------------------------------



  • 4.  RE: Raking - Is there a category limit?

    IBM Champion
    Posted Thu November 17, 2022 11:09 AM
    If you look at the weighted marginals after raking, you should see that they match the specification, except that there can be situations where that is impossible.  As an extreme example, imagine that some category actually has zero cases but the rake specification has a positive number.  Then that problem is impossible.  There is some information about that in the article on raking that gets installed with the extension.  If instead you have a very sparse but nonzero cell with a big target count, you would probably see a very large weight in the weights histogram.

    If you are able to send me the data and specification you used (jkpeck@gmail.com), I can take a look at it.

    --