SPSS Statistics

 View Only

Using local Python objects in SPSSINC TRANS – examples with network statistics

By Archive User posted Fri September 04, 2015 12:45 AM

  
When using SPSSINC TRANS, you have a wider array of functions to compute on cases in SPSS. Within the local session, you can create your own python functions within a BEGIN PROGRAM and END PROGRAM block. In SPSSINC TRANS you pass in the values in the current dataset, but you can also create functions that use data in the local python environment as well. An example use case follows in which you create a network in the local python environment using SPSS data, and then calculate several network statistics on the nodes. Here is a simple hierarchical network dataset that signifies managers and subordinates in an agency.
*Edge list. 
DATA LIST FREE / Man Sub (2F1.0).
BEGIN DATA
1 2
2 3
2 4
3 5
3 6
4 7
4 8
END DATA.
DATASET NAME Boss.

We can subsequently turn this into a NetworkX graph with the code below. Some of my prior SPSS examples using NetworkX had a bit more complicated code using loops and turning the SPSS dataset into the network object. But actually the way SPSS dumps the data in python (as a tuples nested within a list) is how the add_edges_from function expects it in NetworkX, so no looping required (and it automatically creates the nodes list from the edge data).
BEGIN PROGRAM Python. 
import networkx as nx
import spss, spssdata

alldata = spssdata.Spssdata().fetchall() #get SPSS data
G = nx.DiGraph() #create empty graph
G.add_edges_from(alldata) #add edges into graph
print G.nodes()
END PROGRAM.

Note now that we have the graph object G in the local python environment for this particular SPSS session. We can then make our own functions that references G, but takes other inputs. Here I have examples for the geodesic distance between two nodes, closeness and degree centrality, and the average degree of the neighbors.
BEGIN PROGRAM Python.
#path distance
def geo_dist(source,target):
return nx.shortest_path_length(G,source,target)
#closeness centrality
def close_cent(v):
return nx.closeness_centrality(G,v)
#degree
def deg(v):
return G.degree(v)
#average degree of neighbors
def avg_neideg(v):
return nx.average_neighbor_degree(G,nodes=[v])[v]
END PROGRAM.

Here is the node list in a second SPSS dataset that we will calculate the mentioned statistics on. For large graphs, this is nice because you can select out a smaller subset of nodes and only worry about the calculations on that subset. For a crime analysis example, I may be monitoring a particular set of chronic offenders, and I want to calculate how close every arrested person within the month is to the set of chronic offenders.
DATA LIST FREE / Employ (F1.0). 
BEGIN DATA
1
2
3
4
5
6
7
8
END DATA.
DATASET NAME Emp.
DATASET ACTIVATE Emp.

Now we have all the necessary ingredients to calculate our network statistics on these nodes. Here are examples of using SPSSINC TRANS to calculate the network statistics in the local SPSS dataset.
*Geodesic distance from 1.
SPSSINC TRANS RESULT=Dist TYPE=0
/FORMULA "geo_dist(source=1.0,target=Employ)".

*closeness centrality.
SPSSINC TRANS RESULT=Cent TYPE=0
/FORMULA "close_cent(v=Employ)".

*degree.
SPSSINC TRANS RESULT=Deg TYPE=0
/FORMULA "deg(v=Employ)".

*Average neighbor degree.
SPSSINC TRANS RESULT=NeighDeg TYPE=0
/FORMULA "avg_neideg(v=Employ)".








#DataManagement
#network
#python
#SPSS
#SPSSStatistics
0 comments
2 views

Permalink