SPSS Statistics

Your hub for statistical analysis, data management, and data documentation. Connect, learn, and share with your peers!

View Only

Back to Blog List

News from the SPSS Community

By Archive User posted Mon February 14, 2011 05:01 PM

Two utilities for IBM SPSS Statistics have new functionality, and there is one new item for Statistics. All three of these items are in the Utilities Collection in the SPSS Community.

Make Data with Cases

In response to recent discussions on the SPSSX-L listserve, the Make New Dataset with Cases dialog has been enhanced. This dialog box, which appears after installation on the File>New>Data with Cases menu, generates a dataset of random variables. You can now generate data from any of 23 probability distributions (including one not available from the Statistics random number generators). These can then be optionally orthogonalized so that all the correlations are exactly zero. Finally, correlations among the variables can be induced. The available correlation patterns are equal, Toeplitz, factor analytic, arbitrary user specified, and random. For random, you specify the minimum and maximum correlations, and correlations are drawn from a uniform distribution in that range for each variable pair. The correlations will, of course, in most cases change the distribution families of the variables.

Can you guess what the distributions above are?

This dialog, which generates a small Python program, should facilitate simulations and exploring patterns in data. Besides using programmability, it uses an SPSS Statistics Input Program and the MATRIX procedure. The Input Program can be displayed, which may be useful in learning this rather neglected corner of SPSS Statistics technology.

SPSSINC SPLIT DATASET

The SPSSINC SPLIT DATASET extension command has been enhanced to allow more flexibility in the location and names of the output files. This command generates a set of sav files by splitting the active dataset according to the values of one or more variables. It generalizes the built-in Split Files mechanism. Besides its general data management utility, this command can be used with the SPSSINC PROCESS FILES extension command to run whole jobs against each partition of a dataset. Standard Split Files runs a single procedure against each split, but sometimes you need to run a set of commands or reports that don't fit that pattern.

The original version of SPLIT DATASET allowed for only one split variable and wrote all the sav files to a single location using the split variable values or value labels to generate the file names. The enhancements allow an unlimited number of split variables and allow the output files to be written to different directories and the file names to be controlled all based on the values or labels. It will also create any specified directories that don't already exist. The output specifications use a simple pattern language where the names contain expressions in the form ${variable name} where variable values or labels should be substituted.

For example, if the splitting variables are named gender and jobcat with values "m", "f", "clerical", and "manager", the specification

c:/output/gender-${gender}/${jobcat}

would generate output files in directories

c:/output/gender-m/clerical

c:/output/gender-m/manager

c:/output/gender-f/clerical

c:/output/gender-f/manager

assuming that all these combinations exist in the dataset. Value labels can be used instead of values.

Quotetext

The third utility, quotetext, is a dialog box called Quote Text File Contents. It is meant for situations where complex SQL queries are generated without using the Statistics Database Wizard. If you do this, you know how painful it can be to have to adjust your SQL to conform to the Statistics syntax requirements. The SQL syntax has to be quoted in order to protect it from the SPSS parser, which happily then unquotes and joins it all in order to pass it to the ODBC SQL interface. Quotetext takes plain SQL and does this quoting for you, taking into account any quotes already in the SQL text. You can then insert this in your Statistics syntax stream. This dialog appears on the Utilities menu after installation.

All three of these tools can be downloaded from the SPSS Community (www.ibm.com/developerworks/spssdevcentral) and require the Python plugins/Essentials also available from this Community. They work with SPSS Version 17 or later. Plugins for Version 17 or earlier are not available from the Community but are still available from the old SPSS Developer Central site at www.spss.com/devcentral. No programmability knowledge is required to use these tools.

If you haven't explored the wealth of utility, statistical, and graphical tools available from the SPSS Community, I invite you to check these out. There are dozens and dozens of items, and they are all free!