SPSS Statistics

 View Only
  • 1.  Extracting variables from a field created from combining variables

    Posted Fri May 13, 2022 09:20 AM
    Hello!

    I am somewhat new to SPSS, but I am trying to find a solution to deal with a data set where the folks who compiled the data combined entries that were in multiple fields and I need to pull them back apart.  So, the data relates to police stops and the original data set included 0/1 in various fields for possible actions taken.  Each of the original fields were items like "pointed firearm" "used baton" "handcuffed", etc.  So there are lots of combinations of those possible outcomes within the data.  I don't know what program or process they used, but the data is now all in a single field--although there are 2 different fields they recoded this data into: one has the numeric value of the data however they coded (e.g. pointed firearm=8, baton=11, handcuffed=3, asked consent to search=17, etc.), but how it looks varies. Here are some examples of how the data might show up:

    • 5,NA,"17,Y","18,NA","19,Y","20,NA"
    • 24,NA
    • 1,NA,"17,N"
    • 1,NA,"5,NA","6,NA","18,NA"
    As you can see, some have quotes, some don't.  They vary in length.  While the actions are all the numbers, some entries also include the outcome of the action (NA/Y/N) and others do not have a separate outcome (e.g. if one of the actions was "searched" it might have a Y or N as an outcome to indicate whether any contraband was found). The presence or absence of quotes also complicates things, AND it appears even for some numbers there are quotes for some and not others (e.g. both 5,NA and "5,NA" in the list above).  

    There is also a labeled version of the same actions, which lists the actual actions taken.  Here are some examples of how the data looks:

    • Person removed from vehicle by order,"Handcuffed or flex cuffed","Patrol car detention","Search of person was conducted"
    • Curbside detention,"Handcuffed or flex cuffed","Patrol car detention","Asked for consent to search person, Yes","Search of person was conducted","Asked for consent to search property, Yes","Search of property was conducted"
    • Patrol car detention,"Search of person was conducted","Property was seized"
    Same issues as the above.  And while the text seems to be consistent, again there is the issue of sometimes including quotes and sometimes not (e.g. Patrol car detention and "Patrol car detention").

    I have no idea how to pull apart this data to get these options back into their individual fields.  At minimum, I would at least like to be able to create a variable where cases that included actions taken #8-14 (which are all types of uses of force) would be coded 1 and 0 for all else.  I attempted a few different things, but nothing was successful (I can't even recall the various procedures I tried at the moment, but I know that one recurrent issue is that it would pull records where there was a double-digit number that included any of the digits I was looking for (i.e. I wanted to pull cases where there was an 8, but it would also pull cases where there was an 18).

    If folks can think of any solution to this, it would be MUCH appreciated.  This data is a mess, and I am trying to do the best I can with it as I am simultaneously trying to learn how to (re-)learn how to use this program!​

    ------------------------------
    Melanie Ochoa
    ------------------------------

    #SPSSStatistics


  • 2.  RE: Extracting variables from a field created from combining variables

    IBM Champion
    Posted Fri May 13, 2022 09:55 AM
    There are a few things that would be easy to do using the SPSSINC TRANS extension command.  For example, creating variables holding all the numeric codes, ignoring everything else, and/or splitting all the fields at commas.

    Here's an example that gets all the numeric codes, where codes is the variable holding the input string.

    spssinc trans result=a1 to a10
    /formula "re.findall(r'\d+', codes)".

    This would extract all the quoted strings into variables q1 to q10.
    begin program python3.
    import re
    def quotes(s):
      return re.findall(r'".*?"',s)
    end program.

    spssinc trans result=q1 to q10 type=20
     /formula "quotes(codes)".

    More can be done, but this might get you started.

    SPSSINC TRANS can be installed via the Extensions > Extension Hub menu if you don't already have it.




    --





  • 3.  RE: Extracting variables from a field created from combining variables

    Posted Fri May 13, 2022 05:57 PM
    Thank you for this!  The first suggestion you have--pulling out the numeric codes and ignoring everything else (quotes, letters) would be ideal. Could still use a little help to execute properly, though. 

    I did add the extension--although I am not sure where it is located in the system, I did get a notice that it was successfully installed.  I restarted the program.  I pasted the following into the syntax, subbing in AllForce1 for "codes" since that was the name of the variable:

    spssinc trans result=a1 to a10
    /formula "re.findall(r'\d+', AllForce1)".

    I ran the sytnax and got the following error:

    >Error # 1. Command name: spssinc
    >The first word in the line is not recognized as an SPSS Statistics command.
    >Execution of this command stops.

    I tried running it as python script, and got the following:

    Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec 7 2020, 17:08:21) [MSC v.1927 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license()" for more information.
    >>> spssinc trans result=a1 to a10
    /formula "re.findall(r'\d+', AllForce1)".
    SyntaxError: invalid syntax
    >>>

    I am basically a newbie, so if you can walk me through how to actually run the code, that would be tremendously helpful!  
    (Also I am using the free trial currently, although I do not think that should impact the functionality.)

    Thank you!





    ------------------------------
    Melanie Ochoa
    ------------------------------