SPSS Statistics

 View Only

String substitution in Python

By Archive User posted Fri February 20, 2015 12:09 PM

  
I recently had a set of SPSS syntax that iterated over multiple variables and generated many similar graphs. They were time series graphs of multiple variables, so the time stayed the same but the variable on the Y axis changed and the code also changed the titles of the graphs. Initially I was using the % string substitution with a (long) list of replacements. Here is a brief synonymous example.
*Creating some fake data.
MATRIX.
SAVE {UNIFORM(100,6)} /OUTFILE = * /VARIABLES = V1 TO V3 X1 TO X3.
END MATRIX.
DATASET NAME x.

*Crazy long string substitution.
BEGIN PROGRAM Python.
import spss

#my variable and label lists
var = ["V1","V2","V3"]
lab = ["Var 1","Var 2","Var 3"]

for v,l in zip(var,lab):
spss.Submit("""
*Descriptive statistics.
FREQ %s.
CORRELATIONS /VARIABLES=X1 X2 X3 %s.
*Graph 1.
GRAPH /SCATTERPLOT(BIVAR)=X1 WITH %s /TITLE = "%s".
*Graph 2.
GRAPH /SCATTERPLOT(BIVAR)=X2 WITH %s /TITLE = "%s".
*Graph 3.
GRAPH /SCATTERPLOT(BIVAR)=X3 WITH %s /TITLE = "%s".
""" % (v,v,v,l,v,l,v,l))
END PROGRAM.

When you only have to substitute one or two things, "str %s and %s" % (one,two) is no big deal, but here is quite annoying having to keep track of the location of all the separate variables when the list grows. Also we are really just recycling the same object to be replaced multiple times. I thought this is python, so there must be an easier way, and sure enough there is! A simple alternative is to use the format modifier to a string object. Format can take a vector of arguments, so the prior example would be "str {0} and {1}".format(one,two). Instead of %s, you place brackets and the index position of the argument (and Python has zero based indices, so the first element is always 0).

Here is the SPSS syntax updated to use format for string substitution.
*This is much simpler using ".format" for substitution.
BEGIN PROGRAM Python.

var = ["V1","V2","V3"]
lab = ["Var 1","Var 2","Var 3"]

for v,l in zip(var,lab):
spss.Submit("""
*Descriptive statistics.
FREQ {0}.
CORRELATIONS /VARIABLES=X1 X2 X3 {0}.
*Graph 1.
GRAPH /SCATTERPLOT(BIVAR)=X1 WITH {0} /TITLE = "{1}".
*Graph 2.
GRAPH /SCATTERPLOT(BIVAR)=X2 WITH {0} /TITLE = "{1}".
*Graph 3.
GRAPH /SCATTERPLOT(BIVAR)=X3 WITH {0} /TITLE = "{1}".
""".format(v,l))
END PROGRAM.

Much simpler. You can use a dictionary with the % substitution to the same effect, but here the format modifier is a quite simple solution. Another option I might explore more in the future are using string templates, which seem a good candidate for long strings of SPSS code.






#data-manipulation
#Programmability
#python
#SPSS
#SPSSStatistics
4 comments
9 views

Permalink

Comments

Sun February 22, 2015 09:17 AM

This message was posted by a user wishing to remain anonymous
[…] my prior post Jon and Jignesh both made the comment that using locals() in string substitution provides for […]

Fri February 20, 2015 01:24 PM

That is fair (Jignesh shares the same opinion apparently). When staring at a hundred lines of GGRAPH code I like "{?}" as I think it stands out a bit more than "%()s", which is the bigger problem than seeing what is exactly substituted, but it is not a big difference.

Fri February 20, 2015 01:11 PM

I tend to use the locals() variation of this as it makes it clear what is being substituted into the body of the string. The "s" on the right hand side of the parenthesis (in the code below) indicates to format the variable as a string. Alternatives could be "03d", which produces 3 digit integer with 0's left padded, and other such python formatting alternatives are available also.

*Using locals() method.
BEGIN PROGRAM Python.

var = ["V1","V2","V3"]
lab = ["Var 1","Var 2","Var 3"]

for v,l in zip(var,lab):
spss.Submit("""
*Descriptive statistics.
FREQ %(v)s.
CORRELATIONS /VARIABLES=X1 X2 X3 %(v)s.
*Graph 1.
GRAPH /SCATTERPLOT(BIVAR)=X1 WITH %(v)s /TITLE = "%(v)s".
*Graph 2.
GRAPH /SCATTERPLOT(BIVAR)=X2 WITH %(v)s /TITLE = "%(v)s".
*Graph 3.
GRAPH /SCATTERPLOT(BIVAR)=X3 WITH %(v)s /TITLE = "%(v)s".
""" % locals ())
END PROGRAM.

Fri February 20, 2015 12:42 PM

I much prefer to use the named substitution method, e.g.
"""text %(abc)s and %(def)s""" % locals()
While the format mechanism can be better than simple %s, using named substitutions generally gives more readable code IMO.

-Jon Peck (WP seems to have picked my WP account here, which wasn't what I intended.)