Sometimes there are calculations provided for in SPSS tables that are necessary to use for other calculations. A frequent one is to grab certain percentiles from a
FREQUENCY
table (
Equal Probability Histograms in SPSS is one example). The typical way to do this is to grab the table using OMS, but where that is overkill is if you need to merge this info. back into the original data for further calculations. I will show a brief example of grabbing the 25th, 50th, and 75th percentiles from a frequency table and using Python to calculate a robust standardized variable using these summary statistics.
First we will make a set of random data to work with.
SET SEED 10.
MATRIX.
SAVE {UNIFORM(100,1)} /OUTFILE = *.
END MATRIX.
DATASET NAME U.
FREQ COL1 /FORMAT = NOTABLE /PERCENTILES = 25 50 75.
The frequency table we are working with then looks like:
Now to get to the items in this frequency table we just to do a bit of going down a rabbit hole of different python objects.
- The first block grabs the items in the output, which include tables and text.
- The second block then grabs the last table for this specific output. Note that minus 2 from the size of the list is necessary because Python uses zero based indices and there is a log item after the table. So if the size of the list is 10, that means
list[9]
is the last item in the list. (Using negative indices does not work for extracting from the OutputItemList object.)
- The third part then grabs the quantiles from the indices of the table. It ends up being in the first data column (so zero) and in the 3rd, 4th and 5th rows (again, Python uses zero based indices). Using
GetUnformattedValueAt
grabs the floating point number.
- The final part then uses these quantiles to calculate a robust normalized variable by using
spss.Submit
and string substitution. (And then closes the SPSS client at the end.)
BEGIN PROGRAM Python.
import SpssClient, spss
#start the client, grab the items in the output
SpssClient.StartClient()
OutputDoc = SpssClient.GetDesignatedOutputDoc()
OutputItemList = OutputDoc.GetOutputItems()
#Grab the last table, 0 based index
lastTab = OutputItemList.Size() - 2
OutputItem = OutputItemList.GetItemAt(lastTab)
PivotTable = OutputItem.GetSpecificType()
SpssDataCells = PivotTable.DataCellArray()
#Grab the specific quantiles
Q25 = float(SpssDataCells.GetUnformattedValueAt(2,0))
Q50 = float(SpssDataCells.GetUnformattedValueAt(3,0))
Q75 = float(SpssDataCells.GetUnformattedValueAt(4,0))
print [Q25,Q50,Q75]
#Use these stats in SPSS commands
spss.Submit("COMPUTE QuantNormX = ( COL1 - %(Q50)f )/( %(Q75)f - %(Q25)f )." % locals())
SpssClient.StopClient()
END PROGRAM.
While the python code in terms of complexity is just about the same as using OMS to grab the frequency table and merge the quantiles back into the original data, this will be much more efficient. I can imagine using this for other projects too, like grabbing coefficients from a regression model and estimating certain marginal effects.
#data-manipulation#Programmability#python#SPSS#SPSSStatistics