View Only

Sometimes there are calculations provided for in SPSS tables that are necessary to use for other calculations. A frequent one is to grab certain percentiles from a

First we will make a set of random data to work with.

The frequency table we are working with then looks like:

Now to get to the items in this frequency table we just to do a bit of going down a rabbit hole of different python objects.

While the python code in terms of complexity is just about the same as using OMS to grab the frequency table and merge the quantiles back into the original data, this will be much more efficient. I can imagine using this for other projects too, like grabbing coefficients from a regression model and estimating certain marginal effects.

#data-manipulation

#Programmability

#python

#SPSS

#SPSSStatistics

`FREQUENCY`

table (Equal Probability Histograms in SPSS is one example). The typical way to do this is to grab the table using OMS, but where that is overkill is if you need to merge this info. back into the original data for further calculations. I will show a brief example of grabbing the 25th, 50th, and 75th percentiles from a frequency table and using Python to calculate a robust standardized variable using these summary statistics.First we will make a set of random data to work with.

`SET SEED 10.`

MATRIX.

SAVE {UNIFORM(100,1)} /OUTFILE = *.

END MATRIX.

DATASET NAME U.

FREQ COL1 /FORMAT = NOTABLE /PERCENTILES = 25 50 75.

The frequency table we are working with then looks like:

Now to get to the items in this frequency table we just to do a bit of going down a rabbit hole of different python objects.

- The first block grabs the items in the output, which include tables and text.
- The second block then grabs the last table for this specific output. Note that minus 2 from the size of the list is necessary because Python uses zero based indices
*and*there is a log item after the table. So if the size of the list is 10, that means`list[9]`

is the last item in the list. (Using negative indices does not work for extracting from the OutputItemList object.) - The third part then grabs the quantiles from the indices of the table. It ends up being in the first data column (so zero) and in the 3rd, 4th and 5th rows (again, Python uses zero based indices). Using
`GetUnformattedValueAt`

grabs the floating point number. - The final part then uses these quantiles to calculate a robust normalized variable by using
`spss.Submit`

and string substitution. (And then closes the SPSS client at the end.)

`BEGIN PROGRAM Python.`

import SpssClient, spss

#start the client, grab the items in the output

SpssClient.StartClient()

OutputDoc = SpssClient.GetDesignatedOutputDoc()

OutputItemList = OutputDoc.GetOutputItems()

#Grab the last table, 0 based index

lastTab = OutputItemList.Size() - 2

OutputItem = OutputItemList.GetItemAt(lastTab)

PivotTable = OutputItem.GetSpecificType()

SpssDataCells = PivotTable.DataCellArray()

#Grab the specific quantiles

Q25 = float(SpssDataCells.GetUnformattedValueAt(2,0))

Q50 = float(SpssDataCells.GetUnformattedValueAt(3,0))

Q75 = float(SpssDataCells.GetUnformattedValueAt(4,0))

print [Q25,Q50,Q75]

#Use these stats in SPSS commands

spss.Submit("COMPUTE QuantNormX = ( COL1 - %(Q50)f )/( %(Q75)f - %(Q25)f )." % locals())

SpssClient.StopClient()

END PROGRAM.

While the python code in terms of complexity is just about the same as using OMS to grab the frequency table and merge the quantiles back into the original data, this will be much more efficient. I can imagine using this for other projects too, like grabbing coefficients from a regression model and estimating certain marginal effects.

#data-manipulation

#Programmability

#python

#SPSS

#SPSSStatistics

4 comments

4 views

Archive User

Mon July 23, 2018 02:50 PM

I do not remember if there is an easy way to see what table indices are available. If it is out of range it may be the table is not ordered like you would expect it. Also remember python starts at 0 based indices, so in the above example the Valid row would be 0, missing would be 1, 25th percentile is then 2, etc.

-Jon Peck