SPSS Statistics

 View Only

 SPSS Syntax Help

Andrew Denney's profile image
Andrew Denney posted Wed May 14, 2025 08:13 AM

I am a criminal justice researcher. I have been given a dataset by a criminal justice agency that is not in the best shape.

It has a variable named "CurrentCharges" that has all of the state criminal codes per client listed like this "3174120,3917240,3914920". Each seven-digit sequence is a new criminal code. Some of the cases are blank, and some have up to 9 or more charges. Is there a syntax that can help me break each of these seven-digit sequences into its own variable/column? I am not very good with syntax and have tried figuring it out on my own, but to no avail. My backup plan is to manually split these cases, but I hope to avoid that (if possible), as there are 4,400 cases. 

UPDATE: Thank you so much everyone! Jon's syntax worked perfectly. I just had to adjust it from 7 to 8 (due to my own error). The data was originally given to me in an Excel file, and I converted it to SPSS. I really appreciate everyone's help! 

Thank you!

Jon Peck's profile image
Jon Peck IBM Champion

You could write some SPSS looping code to do this, but here is the easiest way.

First, go to Extensions > Extension Hub and install the SPSSINC TRANS extension command.   If you already have it, check the Updates box to be sure you have the latest version.

Then run this code, where charges is the name of the charge variable, and charges1 to charges20 are the names of the output variables.  The command splits the string on the commas and writes the values in up to 20 string variables of length 7.  Change those settings as needed.

spssinc trans result=charges1 to charges20 type=7
    /formula "re.split(',', charges)".

Art Jack's profile image
Art Jack

either of these will work with parsing.  the second one you need to install jon's extension from the extension hub.  i'd probably pivot the tables after using varstocases.

1.

dataset close all.
DATA LIST LIST 
 / id (f3) variable (A35).
BEGIN DATA.
1, "122, 2, 3, 4"
2, "2, 34, 2, 4"
3, "5, 25, 1, 4"
4, "3, 2, 13, 2224"
5, "61, 254, 11, 9745"
END DATA.
string v1 to v4 (a5).
vector temp=v1 to v4.
loop i=1 to 4.
compute a=char.index(variable,', ').
compute temp(i)=char.substr(variable,1,a-1).
if a>0 variable=char.substr(variable,a+2,100).
if a=0 temp(i)=char.substr(variable,1,75).
end loop.
execute.

2. 
dataset close all.
DATA LIST LIST 
 / id (f3) variable (A35).
BEGIN DATA.
1, "122* 2, 3; 4"
2, "2, 34, 2, 4"
3, "5; 25, 1* 4"
4, "3* 2, 13; 2224"
5, "61, 254; 11, 9745"
END DATA.
spssinc trans result = v1 to v5 type=20
    /formula "re.split(r'[;,*\s]\s*', variable)".

Andrew Denney's profile image
Andrew Denney

Thank you so much! I just ran Jon's syntax, and it worked great. The only issue is that it stopped at the 825th case and there are 4552. Is there something I can change to get all 4552 cases like this? Thank you again!

Jon Peck's profile image
Jon Peck IBM Champion

The procedure should run through all the cases, but it respects any active SELECT or FILTER you have.

It will stop if there is a serious error, but in that case you would see an error message.  One common error source is too many values output.  If there are more charges found in a case than allowed for in the output specification, it will stop, but the message would be something like function returned too many values.

Bruce Weaver's profile image
Bruce Weaver

Hello @Andrew Denney.  Is the file you were given a .SAV file, or some other format that you had to import?  If the latter, what format?  Thanks for clarifying. 

Chris Keran's profile image
Chris Keran

Hi Jon,

When I ran this syntax...

spssinc trans result=S1 to S10 type = 40
/formula 're.split(";", Ethnicity)'.

...it correctly split out each instance, however how do we adjust it so that each unique value gets its own column/variable? See my attachment for the existing result.

Jon Peck's profile image
Jon Peck IBM Champion

Chris, do you mean that you want a 0/1 variable for each distinct value in the set of variables generated from SPSSINC TRANS?  TRANS can't easily do that, but you could do this.

Run the SPSSINC TRANS command
Define a multiple category set from the output variables
Use the STATS MCSET CONVERT (Tables > Convert Multiple Category Set) command to create a multiple dichotomy set.

Chris Keran's profile image
Chris Keran

Thank you, Jon. That solved it for me!!!