I had a feeling you were going to point out those vulnerabilities, Jon. I think the following approach (which also uses native SPSS commands & functions) is more robust. I believe it can tolerate typos, so long as the the first letters in hours and minutes (h and m) are correct. And of course, the numeric values have to be correct.
Original Message:
Sent: Mon March 31, 2025 04:57 PM
From: Jon Peck
Subject: Syntax for substring coding into new variable
Bruce's solution is clever, but it is vulnerable to irregularities in the dataset (which is pretty irregular in this case). There might be other abbreviations or typos in the data, and it takes some study to see that it is correct. The pattern matching solution is more robust and easier to validate (if you understand regular expressions) compared to the procedural solution.
I wish SPSS had a regular expression engine in the transformation system, but SPSSINC TRANS does fill that gap.
------------------------------
Jon Peck
Data Scientist
JKP Associates
Santa Fe
Original Message:
Sent: Mon March 31, 2025 02:01 PM
From: Wesley Covalt
Subject: Syntax for substring coding into new variable
Nice! I was mulling over substitution of some sort but never would have thought of the do loop.
------------------------------
Wesley Covalt
Original Message:
Sent: Mon March 31, 2025 11:09 AM
From: Bruce Weaver
Subject: Syntax for substring coding into new variable
I did not have time to tinker with this question when it first appeared, but I found a little time this morning. I have never used the SPSSINC TRANS extension command Jon used, and I expect it may be the most elegant solution. But in the event that (for some reason) someone may be unable to install extension commands (e.g., on lab computers at a university), I thought it would be interesting to tackle this using only native SPSS commands and functions. Here's what I came up with (using the same kinds of tools Wesley & Art were thinking about). I added a few more input strings to ensure that the code works properly if there is only a singe digit number for minutes.
NOTE: Almost immediately after posting the first version of this code, I realized I did not need two DO-REPEAT loops. This is an edited version of the code with just one DO-REPEAT.
NEW FILE.
DATASET CLOSE ALL.
DATA LIST LIST / timestr (A20).
BEGIN DATA
"1h 2m"
"1h 21m"
"1hr 2min"
"1hr 21min"
"44 hours 2minutes"
"44 hours 12minutes"
END DATA.
STRING Time (A20).
COMPUTE Time = LOWER(timestr).
* Replace various forms of "hours" with a colon,
* and various forms of "minutes with a null string.
DO REPEAT
h = "hours" "hr" "h" /
m = "minutes" "min" "m".
COMPUTE Time = REPLACE(Time,h,":").
COMPUTE Time = REPLACE(Time,m,"").
END REPEAT.
* Change Time to a TIME5 variable.
ALTER TYPE Time (TIME5).
VARIABLE LABELS Time 'Time (HH:MM)'.
VARIABLE WIDTH Time(8).
LIST.
Output from the LIST command:
timestr Time
1h 2m 1:02
1h 21m 1:21
1hr 2min 1:02
1hr 21min 1:21
44 hours 2minutes 44:02
44 hours 12minutes 44:12
Number of cases read: 6 Number of cases listed: 6
I hope this helps.
Cheers,
Bruce
------------------------------
Bruce Weaver
Original Message:
Sent: Fri March 28, 2025 11:14 AM
From: Sara Moore
Subject: Syntax for substring coding into new variable
I went back and deleted variables that were created while determining the syntax, one was "hours". So, I ran the code again and it worked for both hours and minutes. Thank you!
------------------------------
Sara Moore
Original Message:
Sent: Thu March 27, 2025 09:17 PM
From: Jon Peck
Subject: Syntax for substring coding into new variable
The code I posted would create two variables - one for hours and the second one for minutes.
--
Original Message:
Sent: 3/27/2025 8:15:00 PM
From: Sara Moore
Subject: RE: Syntax for substring coding into new variable
Thank you for the guidance on using the extension. I was not aware of this extension, but will be learning more about it for future use. When I ran the syntax, it extracted the minutes. Thank you!
------------------------------
Sara Moore
Original Message:
Sent: Thu March 27, 2025 01:52 PM
From: Jon Peck
Subject: Syntax for substring coding into new variable
There is an easy solution for this that accounts for the variations in the works.
First, use Extensions > Extension Hub to install the SPSSINC TRANS extension command. If you already have that,
you might see an update available if it isn't the latest version.
Here is the syntax, assuming that the variable is named times.
spssinc trans result=hours minutes
/formula "re.findall(r'\d+', times)".
That finds all the strings of digits and extracts them into numeric variables named hours and minutes.
If there might be other numeric strings later in the input, add some other variable names to the result keyword in the command.
Original Message:
Sent: 3/26/2025 1:21:00 PM
From: Sara Moore
Subject: Syntax for substring coding into new variable
I am seeking guidance on writing syntax in SPSS 29 for creating a new numeric variable from a string variable. The string variable has times in the following formats:
1h 21m
1hr 21min
44 hours 12minutes
I need to create a new variable with just the total hours. I recognize this is a multi-step coding process where I may need to extract the hours into a new variable, then the minutes, before combining the two in a time scale variable.
I have been able to run the following syntax, but with issues. It will provide only one digit versus two if it is a two digit number (1 for 1 hour, but only 4 for 44 hours). How can I revise the syntax to include every number before the "h" and then ask it to create a new variable with every number after the "h" but before the "m"?
*New variable for Mobile_Device.
COMPUTE recodeMD = NUMBER(Char.substr(Mobile_Device,1,Char.Index(Mobile_Device, "h")),F).
Execute.
It also gives me errors, primarily the following: The string to be converted via the NUMBER function is of zero length. The result has been set to the system-missing value.
Thank you!
------------------------------
SM
------------------------------