SPSS Statistics

SPSS Statistics

Your hub for statistical analysis, data management, and data documentation. Connect, learn, and share with your peers! 

 View Only
Expand all | Collapse all

Syntax for substring coding into new variable

  • 1.  Syntax for substring coding into new variable

    Posted Thu March 27, 2025 12:37 PM

    I am seeking guidance on writing syntax in SPSS 29 for creating a new numeric variable from a string variable. The string variable has times in the following formats:

    1h 21m

    1hr 21min

    44 hours 12minutes

    I need to create a new variable with just the total hours. I recognize this is a multi-step coding process where I may need to extract the hours into a new variable, then the minutes, before combining the two in a time scale variable. 

    I have been able to run the following syntax, but with issues. It will provide only one digit versus two if it is a two digit number (1 for 1 hour, but only 4 for 44 hours). How can I revise the syntax to include every number before the "h" and then ask it to create a new variable with every number after the "h" but before the "m"?

    *New variable for Mobile_Device.
    COMPUTE recodeMD = NUMBER(Char.substr(Mobile_Device,1,Char.Index(Mobile_Device, "h")),F).
    Execute.

    It also gives me errors, primarily the following: The string to be converted via the NUMBER function is of zero length.  The result has been set to the system-missing value. 

    Thank you!



    ------------------------------
    SM
    ------------------------------


  • 2.  RE: Syntax for substring coding into new variable

    Posted Thu March 27, 2025 01:28 PM

    This modification will pull out the hours for you:

    COMPUTE recodeMD = NUMBER(Char.substr(Mobile_Device, 1, Char.Index(Mobile_Device, "h")-1), "F2.0").

    You have to subtract 1 from the index of h and put the format in quotes.  Use "F3.0" if you expect 100s.

    I am still thinking about how to extract the minutes.



    ------------------------------
    Wesley Covalt
    ------------------------------



  • 3.  RE: Syntax for substring coding into new variable

    Posted Thu March 27, 2025 03:39 PM
    The starting point needs to start at H+1 for minutes or you can do something similar M-1.  





  • 4.  RE: Syntax for substring coding into new variable

    Posted Thu March 27, 2025 08:18 PM

    Thank you!



    ------------------------------
    Sara Moore
    ------------------------------



  • 5.  RE: Syntax for substring coding into new variable

    Posted Thu March 27, 2025 08:17 PM

    Thank you for the guidance on the syntax revisions for the hours. I was able to use it for extracting the entire hours noted in the variable. After using the extension below for the minutes, I converted the hours to minutes then added them together for a total time variable. Thank you!



    ------------------------------
    Sara Moore
    ------------------------------



  • 6.  RE: Syntax for substring coding into new variable

    Posted Thu March 27, 2025 01:53 PM
    There is an easy solution for this that accounts for the variations in the works.
    First, use Extensions > Extension Hub to install the SPSSINC TRANS extension command.   If you already have that,
    you might see an update available if it isn't the latest version.

    Here is the syntax, assuming that the variable is named times.
    spssinc trans result=hours minutes
    /formula "re.findall(r'\d+', times)".

    That finds all the strings of digits and extracts them into numeric variables named hours and minutes.
    If there might be other numeric strings later in the input, add some other variable names to the result keyword in the command.





  • 7.  RE: Syntax for substring coding into new variable

    Posted Thu March 27, 2025 08:15 PM

    Thank you for the guidance on using the extension. I was not aware of this extension, but will be learning more about it for future use. When I ran the syntax, it extracted the minutes. Thank you!



    ------------------------------
    Sara Moore
    ------------------------------



  • 8.  RE: Syntax for substring coding into new variable

    Posted Thu March 27, 2025 09:18 PM
    The code I posted would create two variables - one for hours and the second one for minutes.


    --





  • 9.  RE: Syntax for substring coding into new variable

    Posted Fri March 28, 2025 11:15 AM

    I went back and deleted variables that were created while determining the syntax, one was "hours". So, I ran the code again and it worked for both hours and minutes. Thank you! 



    ------------------------------
    Sara Moore
    ------------------------------



  • 10.  RE: Syntax for substring coding into new variable

    Posted Mon March 31, 2025 11:09 AM
    Edited by Bruce Weaver Mon March 31, 2025 12:12 PM

    I did not have time to tinker with this question when it first appeared, but I found a little time this morning.  I have never used the SPSSINC TRANS extension command Jon used, and I expect it may be the most elegant solution.  But in the event that (for some reason) someone may be unable to install extension commands (e.g., on lab computers at a university), I thought it would be interesting to tackle this using only native SPSS commands and functions.  Here's what I came up with (using the same kinds of tools Wesley & Art were thinking about).  I added a few more input strings to ensure that the code works properly if there is only a singe digit number for minutes.

    NOTE:  Almost immediately after posting the first version of this code, I realized I did not need two DO-REPEAT loops.  This is an edited version of the code with just one DO-REPEAT. 

    NEW FILE.
    DATASET CLOSE ALL.
    DATA LIST LIST / timestr (A20).
    BEGIN DATA
    "1h 2m"
    "1h 21m"
    "1hr 2min"
    "1hr 21min"
    "44 hours 2minutes"
    "44 hours 12minutes"
    END DATA.

    STRING Time (A20).
    COMPUTE Time = LOWER(timestr).
    * Replace various forms of "hours" with a colon,
    * and various forms of "minutes with a null string.
    DO REPEAT
      h = "hours" "hr" "h" / 
      m = "minutes" "min" "m".
     COMPUTE Time = REPLACE(Time,h,":").
     COMPUTE Time = REPLACE(Time,m,"").
    END REPEAT.
    * Change Time to a TIME5 variable.
    ALTER TYPE Time (TIME5).
    VARIABLE LABELS Time 'Time (HH:MM)'.
    VARIABLE WIDTH Time(8).
    LIST.

    Output from the LIST command:

     
    timestr               Time 
     
    1h 2m                 1:02 
    1h 21m                1:21 
    1hr 2min              1:02 
    1hr 21min             1:21 
    44 hours 2minutes    44:02 
    44 hours 12minutes   44:12 
     
    Number of cases read:  6    Number of cases listed:  6

    I hope this helps.

    Cheers,
    Bruce



    ------------------------------
    Bruce Weaver
    ------------------------------



  • 11.  RE: Syntax for substring coding into new variable

    Posted Mon March 31, 2025 02:02 PM

    Nice!  I was mulling over substitution of some sort but never would have thought of the do loop.



    ------------------------------
    Wesley Covalt
    ------------------------------



  • 12.  RE: Syntax for substring coding into new variable

    Posted Mon March 31, 2025 04:57 PM

    Bruce's solution is clever, but it is vulnerable to irregularities in the dataset (which is pretty irregular in this case).  There might be other abbreviations or typos in the data, and it takes some study to see that it is correct.  The pattern matching solution is more robust and easier to validate (if you understand regular expressions) compared to the procedural solution.

    I wish SPSS had a regular expression engine in the transformation system, but SPSSINC TRANS does fill that gap.



    ------------------------------
    Jon Peck
    Data Scientist
    JKP Associates
    Santa Fe
    ------------------------------



  • 13.  RE: Syntax for substring coding into new variable

    Posted Mon March 31, 2025 05:55 PM

    I had a feeling you were going to point out those vulnerabilities, Jon.  I think the following approach (which also uses native SPSS commands & functions) is more robust.  I believe it can tolerate typos, so long as the the first letters in hours and minutes (h and m) are correct.  And of course, the numeric values have to be correct.

    NEW FILE.
    DATASET CLOSE ALL.
    DATA LIST LIST / timestr (A20).
    BEGIN DATA
    "1h 2m"
    "1h 21m"
    "1hr 2min"
    "1hr 21min"
    "1hr 21mni"
    "44 hours 2minutes"
    "44 hours 12minutes"
    "44 huors 15minates"
    END DATA.

    * Notice that the input strings have some typos 
    * (e.g., mni, minates, huors, etc.). 
    * I think the following code will work so long 
    * as the first letters of hours and minutes are correct.

    STRING #Time (A30).
    COMPUTE #Time = LOWER(timestr).
    * Insert a blank before "h".
    COMPUTE #Time = REPLACE(#Time,"h"," h"). 
    * Insert a blank before "m".
    COMPUTE #TIME = REPLACE(#Time,"m"," m").
    * Get the location of the first blank.
    COMPUTE #blank1 = CHAR.INDEX(#Time," ").
    * Get the location of the last blank.
    COMPUTE #blank2 = CHAR.RINDEX(#Time," ").
    * Extract the hours & minutes.
    COMPUTE #hh = NUMBER(CHAR.SUBSTR(#Time,1,#blank1),F2).
    COMPUTE #mm = NUMBER(CHAR.SUBSTR(#Time,#blank2-2,2),F2).
    * Combine the hours & minutes into a TIME variable.
    COMPUTE Time = TIME.HMS(#hh,#mm,0).
    FORMATS Time (TIME5).
    LIST.

    Output from LIST:

    timestr               Time 
     
    1h 2m                 1:02 
    1h 21m                1:21 
    1hr 2min              1:02 
    1hr 21min             1:21 
    1hr 21mni             1:21 
    44 hours 2minutes    44:02 
    44 hours 12minutes   44:12 
    44 huors 15minates   44:15 
     
    Number of cases read:  8    Number of cases listed:  8



    ------------------------------
    Bruce Weaver
    ------------------------------