SPSS Statistics

SPSS Statistics

Your hub for statistical analysis, data management, and data documentation. Connect, learn, and share with your peers! 

 View Only
  • 1.  How to calculate the same statistical method for numerous variables

    Posted Mon February 07, 2022 09:20 AM
    Hello,

    We, the survey team, have to run >200 variables, for the same statistics, to produce >200 output files.
    One tedious way is to replace one variable at a time for >200 times in the syntax to run statistics.

    In the following example, we use C8 as a variable. Can we write extra lines in the syntax (such as loop, repeat, etc.) and provide all variables (A1 to A20, B1A to B1Z, C1 to C8, etc), so that SPSS automatically replace other variables in places of C8, runs 200 times, and produce 200 reports on a run?

    The sample syntax of C8 is given below:

    OMS
    /SELECT TABLES
    /IF COMMANDS = ['CSTABULATE']
    INSTANCES = [LAST]
    /EXCEPTIF SUBTYPE = ['NOTES']
    /DESTINATION FORMAT = XLSX
    OUTFILE = "...\C8_cstab_output.xlsx"
    /NOWARN.

    MISSING VALUES C8 (66 thru 999).

    CSTABULATE
    /PLAN FILE='...\weighing2.csaplan'
    /TABLES VARIABLES=C8
    /CELLS TABLEPCT
    /STATISTICS SE CV CIN(95) COUNT
    /MISSING SCOPE=TABLE CLASSMISSING=EXCLUDE.
    EXECUTE.

    OMSEND.

    Thank you in advance.

    ------------------------------
    Myint Tun
    ------------------------------

    #SPSSStatistics


  • 2.  RE: How to calculate the same statistical method for numerous variables

    Posted Mon February 07, 2022 12:11 PM
    I am not fond of the macro facility.  Here is a Python solution.  It does not require the user to know anything about Python, but it requires you to install the 
    SPSSINC SELECT VARIABLES
    and
    SPSSINC PROGRAM
    extension commands via the Extensions > Extension Hub menu.
    I have assumed that you have V27 or later, but this can readily be modified if your version is older.

    First, here is the solution using the employee data.sav file shipped with Statistics as an example followed by the explanation.

    begin program python3.
    import spss, sys
    def tabulate():
        for v in sys.argv[1:]:
            cmd = f"""OMS
            /SELECT TABLES
            /IF COMMANDS = ['CSTABULATE']
            INSTANCES = [LAST]
            /EXCEPTIF SUBTYPE = ['NOTES']
            /DESTINATION FORMAT = XLSX
            OUTFILE = "...\{v}_cstab_output.xlsx"
            /NOWARN.  
            MISSING VALUES {v} (66 thru 999).
           
            CSTABULATE
            /PLAN FILE='...\weighing2.csaplan'
            /TABLES VARIABLES={v}
            /CELLS TABLEPCT
            /STATISTICS SE CV CIN(95) COUNT
            /MISSING SCOPE=TABLE CLASSMISSING=EXCLUDE."""
            spss.Submit(cmd)
    end program.

    SPSSINC SELECT VARIABLES MACRONAME="!vars" VARIABLES=bdate to salary prevexp
    /OPTIONS ORDER=FILE PRINT=YES IFNONE=ERROR SEPARATOR=" ".

    SPSSINC PROGRAM tabulate !vars.

    First you run the begin program ... end program block.  You do this just once in a session.
    That code will substitute each variable listed when it is invoked in the places where you see {v} and then run the generated syntax. 

    Next you specify the variables of interest in the SPSSINC SELECT VARIABLES command.  It supports the TO convention, which MACRO does not.  Not shown here is that it also allows wild card (regular expression) lists of variable names if you can specify a pattern for the names, e.g., all names that start with C followed by one or more digits.  I can help with that if that applies to your case.

    SPSSINC SELECT VARIABLES creates a macro, here named !vars, that expands the list of variables to a simple list.  Put your variable selection here.  The output in the Viewer will show the selected variables.

    Finally you run the SPSSINC PROGRAM command.  That invokes the function defined above and passes to it the listed variables.

    Variations on this can be done if this needs some fine tuning.

    I couldn't test this on your data, of course, but if you run into any problems, send me a sample of the data and your syntax, and I'll figure it out (jkpeck@gmail.com).

    It is possible to do this with a traditional MACRO solution, but I always find MACRO painful.


    --





  • 3.  RE: How to calculate the same statistical method for numerous variables

    Posted Sat February 12, 2022 04:02 PM
    I was having trouble trying to reply to your email, so I am posting this here.

    1.  We are using CSTABULATE (Complex Samples for Survey) module.  At your last syntax line, you recommended as SPSSINC PROGRAM tabulate !vars.  I tried to use cstablulate instead of tabulate, and it doesn't work.  Are we supposed to use tabulate, although we are using CSTABULATE?
    >>> The "tabulate" refers to the little program defined in def tabulate.  No connection to the SPSS syntax.  You specify CSTABULATE as the SPSS syntax to run.  The syntax could be anything.

    2.  I compared results of individual analysis vs automated analyses, I found the SE and Coefficient of Variations in automated one is slightly higher!  Please see the screenshot below.  Maybe the problem I described above, cstabulate vs tabulate.
    >>> Interesting.  You are running exactly the same SPSS syntax either way, and you can see from the case counts that the same cases are going into both versions.  I suspect that if you increase the precision of the display of the other statistics to show more decimals that you would see a slight difference in other statistics.  I don't know whether any of the calculations are done with a bit of Monte Carlo calculation, but, if so, there would be slight variations in the results.  I don't think that the variation would have anything to do with the automated vs standard syntax.  Try running the simple syntax twice to see if the results show any differences.

    3. You also pointed out that python allows variables name using "to" and wild card.  Can we use "TO" for variables stored in dataset and wildcard C* for all variables starting with C?  If I would like to analyze C3 to C12, may I use C3 TO C12 in variables names?
    >>> Your variable specification could be C3 TO C12, referring to variables in file order.  Or you could omit the VARIABLES keyword and list and add
    /PROPERTIES PATTERN = "c.*"
    That means all variable names starting with c and followed by any number of characters.  You can combine that with a variable list, but you can experiment and just look at the displayed list of variables included in the macro definition to see if that is what you want.  You can even add qualifications according to variable types and measurement levels.
    Good luck.


    ------------------------------
    Jon Peck
    ------------------------------