SPSS Statistics

SPSS Statistics

Your hub for statistical analysis, data management, and data documentation. Connect, learn, and share with your peers! 

 View Only

Automating tasks in SPSS using production jobs

By Archive User posted Wed December 03, 2014 10:34 AM

  


For tasks that take along time or ones I want done on a regular basis I typically have the computer do the work while I am away. Such example tasks I have set up before are:

  • querying large databases and dumping the results into flat files (querying large tables may take an hour or longer)

  • conduction statistical analysis that takes along time to converge

  • generating automated graphs and statistics


The SPSS facility to accomplish these are production jobs, and I will briefly detail how to set up a production job and then run the job from the command line for a simple example. So first I will walk through creating a production job to a set of syntax.

So first to set up a production job go to Utilities -> Production Facility in the menu bar. (Note you can open the screen shots larger in a separate window.)

Next you will be presented with the screen below. To specify the syntax file(s) that the production job will run, click the New button.

After you click the New button, the green plus sign in the Syntax files section will be active, and then you can browse to your sps file. Here you can see I selected a file named GenerateChart.sps in a particular directory on my C drive. You can specify the job to run multiple syntax files, but here I only choose one.

Next navigate to the Output section of the window. Here you need to choose where to save the SPSS output. Here I choose to save it in the same directory under the name Output. I choose the output format to be plain text. This ends up being the same output as if you ran the syntax interactively and then used EXPORT OUTPUT.

I could have exported the charts in the syntax directly by using EXPORT OUTPUT, but you can have the production facility do that as well. If you click on the Options button in the Output section a new dialogue will appear that lets you choose to save the charts if you want. Here I save them as png files.

Production jobs also have the capability to create user input variables directly in the syntax, using the form @VariableName in the syntax. This is what the section Run time variables deals with. These are nice if you have a set of syntax and want to input some arbitrary information, as when you run the production job a GUI pops up asking for the input, but I don't illustrate that functionality here.

Below is the specific set of syntax that grabs a csv file, swdata.csv, calculates a moving median for each chat room, and makes some time series charts. The csv file is a set of scrapped chat data from the Cross Validated and R chat rooms via Scraperwiki (more details here). It aggregates the number of monologue tags (a pretty good indicator of the number of posts in the room) per day, so is an estimate of the chat activity.
*Where the data is located.
FILE HANDLE data
/NAME = "C:Usersandrew.wheelerDropboxDocumentsBLOGProductionJob".
DATA LIST LIST (",") FILE = "dataswdata.csv" SKIP = 1
/Date (SDATE10) Mono (F4.0) Baseroom (A100).
DATASET NAME Chats.

*Calculate moving median.
AUTORECODE VARIABLES = Baseroom /INTO BaseN.
SORT CASES BY BaseN Date.
SPLIT FILE BY BaseN.
CREATE MovMed = RMED(Mono 5).
FORMATS MovMed (F4.0) Date (MOYR6).

*Make charts.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=Date Mono MovMed
MISSING=VARIABLEWISE
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: Date=col(source(s), name("Date"))
DATA: Mono=col(source(s), name("Mono"))
DATA: MovMed=col(source(s), name("MovMed"))
GUIDE: axis(dim(1))
GUIDE: axis(dim(2), label("Mono Tags"))
ELEMENT: line(position(Date*Mono), color(color.grey),
transparency(transparency."0.5"), size(size."1"))
ELEMENT: line(position(Date*MovMed), color(color.red),
transparency(transparency."0.4"), size(size."1"))
END GPL.

Now I am interested in running this set of syntax automatically. The details will change depending on your operating system, but on my windows machine the easiest way to do this is to create a bat file that specifies the commands. Now I named the production job ChatRoom_Dialogue.spj, and to run this job I filled in the bat file with the text:
REM delete old csv file and download new one
del swdata.csv
wget "http://goo.gl/mxyRI7" --no-check-certificate
REM this runs the SPSS syntax
"C:Program FilesIBMSPSSStatistics22stats.exe" "C:Usersandrew.wheelerDropboxDocumentsBLOGProductionJobChatRoom_Dialogue.spj" -production silent

Here I downloaded the wget utility to grab the csv file. (Note REM is to comment lines.) The bat file treats where ever it is located at as the directory for the commands, so I first use del to delete the older csv file, and then grab the new csv file from the listed url and it automatically saves the file in the folder where the bat file is located. Then I call the SPSS syntax by starting stats.exe and then calling the spj production job file. I use the switches -production silent so I am not prompted for any user input values to insert into the syntax. If you had stats.exe as a windows system path you wouldn't need to worry about using the fully quoted strings, put I typically don't worry about it (unless the program automatically adds it). Note that running the fully quoted string for stats.exe makes the windows command directory go to there, so you need to then fully quote the spj files path. Note you could call the first two commands directly within SPSS using the HOST command, but being able to chain multiple commands together makes running them in the bat file directly a bit more flexible.

So now you can simply double click the bat file and it will download the new data and create two graphs. To automate the job you can use the Windows task scheduler to make the bat file at a particular time. Here is the bundled up files to run on your own (you just need to change the files paths in the sps and the bat file to wherever you want to run the script).

Here are the two graphs the production job creates. The first is the Cross Validated chat room, and the second is the R stackoverflow chat room.



I planned on writing a blog post for the CV blog awhile ago about these trends, so if I ever get to it these are teasers.



#SPSS
#SPSSStatistics
6 comments
5 views

Permalink

Comments

Wed August 30, 2017 08:07 AM

Yes, 14 parameters is a lot, that was also my thought when my colleague asked me that question. Some where long UNC paths, so that adds up quickly. "On computers running Microsoft Windows XP or later, the maximum length of the string that you can use at the command prompt is 8191 characters." https://support.microsoft.com/nl-nl/help/830473/command-prompt-cmd--exe-command-line-string-limitation. I would have to check whether this limit was exceeded. The string lenght might be reduced by using environment vars in the paths. Another option might be to use a config file for part of the parameters.
My colleague is not well-versed in Python (but perhaps he should!). Also, when using SPSS from within Python, SPSS behaves slightly differently. For example, OUTPUT EXPORT is not implemented. And sometimes, an Exception is raised that should actually be a warning (I am trying to remember the details of this one...)

Tue August 29, 2017 07:44 PM

In theory, there is nolimit on the number of @ symbols. Someone was able to use 17. Might be a side effect of some limit on the command line length. As for getting @symbols into a Python program, how about making the program into a function and then using SPSSINC PROGRAM to invoke it and pass the parameters? That gives you an argc/argv structure from which you can parse out the values, which will be expanded by Statistics as it parses SPSSINC PROGRAM. See the syntax help for SPSSINC PROGRAM for details and examples.

Tue August 29, 2017 07:24 PM

I can't answer any of your questions Albert-Jan - you know all these things better than me. You will have to bug Jon directly or someone else who works on SPSS.

But that being said, if you need more than 14 parameters wouldn't it be better to call SPSS code from within python, not the other way around?

Tue August 29, 2017 08:32 AM

We use the "-symbol" option extensively, but I am wondering: is there is a maximum number of @symbols? It seems to be 14. If so, is there a work-around?

Also, is there a recommended approach to pass a @symbol to a BEGIN-END PROGRAM block? I used spssaux.getAttributesDict() before (I got it from devWorks), but that requires an open dataset. This also works, though it requires that symbols are quoted.

file handle year /name = @year.
begin program.
from os.path import basename
import spssaux
file_handles = spssaux.FileHandles().resolve
year = basename(file_handles("year"))
print "Year in Python", year
end program.

Basically, spss.setMacroValue exists, but the 'spss.getMacroValue' counterpart is missing. For simple macros without any !IF, !DO and what not, it should be quite straightforward.

Best wishes,
Albert-Jan

Wed December 03, 2014 12:46 PM

Yes good things to know Jon. Also that you can do the @ substitution from the command line - although I have not come across a situation in which I needed to do that.

Wed December 03, 2014 11:03 AM

Two comments:
1) The @ symbol substitution uses the standard Statistics macro facility, so if you want, for example, substitution in a literal such as a title, you need to write the syntax in such a way that that will be expanded.
2) If you have Statistics Server available, the production facility can run the job on that remote server so you can free up your local machine. In that scenario, you need to write the syntax so that file references are with respect to the server file system.