SPSS Statistics

SPSS Statistics

Your hub for statistical analysis, data management, and data documentation. Connect, learn, and share with your peers! 

 View Only
Expand all | Collapse all

Using a Data List variable how to?

  • 1.  Using a Data List variable how to?

    Posted Tue October 10, 2023 09:57 AM

    Hello,

    I am unable to figure out how to get below to work. This is just a sample of a larger list that is evaluated

    and if matched, code the result as 1.

    I started with hard coding the target, and it works well and have done this up to 25 times using OR.

    I now have a larger list of codes and would like to simplify the work.

    The LIST command returns the values properly. Suggestions?

    *CM_MALIGNANCY
    DATA LIST /ABC (A3).
    BEGIN DATA
    C00
    C01
    C02
    C03
    C04
    C05
    C06
    END DATA.
     LIST.
    DO REPEAT X = I10_DX1 TO I10_DX40.
       DO IF CHAR.SUBSTR (X,1,3) = 'ABC'.
    COMPUTE CM_MALIGNANCY = 1.
        END IF.
    END REPEAT.
    EXECUTE.


    ------------------------------
    Paul Boserup
    ------------------------------


  • 2.  RE: Using a Data List variable how to?

    Posted Tue October 10, 2023 11:16 AM
    You can use a list with the ANY function.  If there are a lot of codes, there are other solutions, but they are more complicated.





  • 3.  RE: Using a Data List variable how to?

    Posted Tue October 10, 2023 01:07 PM

    Thank you to Jon. Sorry, I'm new to the forum.

    This is an example of how I am currently doing the query:

    *CM_DIABETES WITH CHRONIC COMPLICATIONS
    DO REPEAT X = I10_DX1 TO I10_DX40.
       DO IF CHAR.SUBSTR (X,1,4) = 'E102' OR CHAR.SUBSTR (X,1,4) = 'E103' OR CHAR.SUBSTR (X,1,4) = 'E104' OR CHAR.SUBSTR (X,1,4) = 'E105' OR CHAR.SUBSTR (X,1,4) = 'E107' 
    OR CHAR.SUBSTR (X,1,4) = 'E112' OR CHAR.SUBSTR (X,1,4) = 'E113' OR CHAR.SUBSTR (X,1,4) = 'E114' OR CHAR.SUBSTR (X,1,4) = 'E115' OR CHAR.SUBSTR (X,1,4) = 'E117' OR CHAR.SUBSTR (X,1,4) = 'E122' 
    OR CHAR.SUBSTR (X,1,4) = 'E123' OR CHAR.SUBSTR (X,1,4) = 'E124' OR CHAR.SUBSTR (X,1,4) = 'E125' OR CHAR.SUBSTR (X,1,4) = 'E127' OR CHAR.SUBSTR (X,1,4) = 'E132' 
    OR CHAR.SUBSTR (X,1,4) = 'E133' OR CHAR.SUBSTR (X,1,4) = 'E134' OR CHAR.SUBSTR (X,1,4) = 'E135' OR CHAR.SUBSTR (X,1,4) = 'E137' OR CHAR.SUBSTR (X,1,4) = 'E142'
    OR CHAR.SUBSTR (X,1,4) = 'E143' OR CHAR.SUBSTR (X,1,4) = 'E144' OR CHAR.SUBSTR (X,1,4) = 'E145' OR CHAR.SUBSTR (X,1,4) = 'E147'.
    COMPUTE CM_DIABETES_W_CHRONIC_COMLICATIONS = 1.
        END IF.
    END REPEAT.
    EXECUTE.

    Regards -pb



    ------------------------------
    Paul Boserup
    ------------------------------



  • 4.  RE: Using a Data List variable how to?

    Posted Tue October 10, 2023 02:35 PM
    You can do this much more compactly with ANY.






  • 5.  RE: Using a Data List variable how to?

    Posted Tue October 10, 2023 11:21 AM

    Which codes are you trying to create a flag for?  Do you need a code for each one or just one if any of them meet the criteria? 



    ------------------------------
    Art Jack
    ------------------------------



  • 6.  RE: Using a Data List variable how to?

    Posted Tue October 10, 2023 12:26 PM

    Hi Art, thanks for the reply.

    I'm looking to simply COMPUTE CM_MALIGNANCY = 1.

    If any of the codes in the DATA LIST match any of the codes in I10_DX1 thru I10_DX40.

    I will research the ANY function.

    Regards -pb



    ------------------------------
    Paul Boserup
    ------------------------------



  • 7.  RE: Using a Data List variable how to?

    Posted Tue October 10, 2023 12:36 PM

    Art,

    Just to clarify, in the following; DO IF CHAR.SUBSTR (X,1,3) = 'ABC'.

    Is the use of a variable (in this case ABC), not supported in CHAR.SUBSTR as I have it or is my syntax incorrect?

    -pb



    ------------------------------
    Paul Boserup
    ------------------------------



  • 8.  RE: Using a Data List variable how to?

    Posted Wed October 11, 2023 08:48 AM
    Edited by Art Jack Wed October 11, 2023 10:34 AM

    John's suggestion of ANY is best.  It would look something like this.  You can put it within a do repeat do if block if there are other parameters. 

    if ANY(I10_DX1 to I10_DX40, 'E102', 'E103') flag=1.



    ------------------------------
    Art Jack
    ------------------------------



  • 9.  RE: Using a Data List variable how to?

    Posted Fri October 13, 2023 12:15 PM
    Edited by Paul Boserup Fri October 13, 2023 12:17 PM

    Thanks for your follow-up.

    No matter what I try, I cannot get the ANY command to work with a DATA LIST variable.

    The LIST command does print out the codes.

    Here is a small example of the general syntax, and it stops with error # 4285.

    *CM_ANEMIA
    DATA LIST /IDC10_4(A3).
    BEGIN DATA
    C00
    C01
    C02
    C03
    C04
    C05
    C06
    END DATA.
    LIST.

    IF ANY(I10_DX10 to I10_DX40, 'IDC10_4') CM_ANEMIA=1.
    EXECUTE.

    >Error # 4285 in column 8.  Text: I10_DX10 
    >Incorrect variable name: either the name is more than 64 characters, or it is 
    >not defined by a previous command. 
    >Execution of this command stops.



    ------------------------------
    Paul Boserup
    ------------------------------



  • 10.  RE: Using a Data List variable how to?

    Posted Fri October 13, 2023 12:30 PM
    In your example, the active dataset has only the ABC variable in it.  Using ANY, you would list the desired codes in the ANY syntax
    compute any(somevariable, 'C00', 'C01', ...).
    where somevariable holds the code.

    If that doesn't fit your data structure, please elaborate.

    --





  • 11.  RE: Using a Data List variable how to?

    Posted Fri October 13, 2023 01:43 PM

    Thanks Gents, I appreciate the assist!

    I have it working. For the record, here is what I ended up with.

    *CM_ANEMIA

    IF ANY(I10_DX1 to I10_DX40,'D62','D460','D461','D464','D500','D500','D501',
    'D508','D509','D510','D511','D512','D513','D518','D519','D520','D521','D528',
    'D529','D530','D531','D532','D538','D539','D550','D551','D553','D558','D559',
    'D588','D589','D590','D592','D593','D594','D595','D596','D598','D599','D600',
    'D601','D608','D609','D611','D612','D613','D619','D630','D631','D638','D640',
    'D641','D642','D643','D644','D649','D4620','D4621','D4622','D5521','D5529',
    'D5910','D5911','D5912','D5913','D5919','D6101','D6109','D6182','D6189',
    'D6481','D6489','D61810','D61811','D61818')  CM_ANEMIA=1.
    EXECUTE.



    ------------------------------
    Paul Boserup
    ------------------------------



  • 12.  RE: Using a Data List variable how to?

    Posted Fri October 13, 2023 02:04 PM
    I am not sure that code will work.  The first variable, I10_DX1, would be the test variable, and ANY would return true if any of the subsequent variables or constants match it.  If only one of the variables in the TO block holds a code, you are okay, but if, say, I10_DX1 is "ABC. and I10_DX2 is also "ABC" it would return true.

    ANY. ANY(test,value[,value,...]). Logical. Returns 1 or true if the value of test matches any of the
    subsequent values; returns 0 or false otherwise. This function requires two or more arguments. For
    example, ANY(var1, 1, 3, 5) returns 1 if the value of var1 is 1, 3, or 5 and 0 for other values. ANY can also be used to scan a list of variables or expressions for a value. For example, ANY(1, var1, var2, var3) returns 1 if any of the three specified variables has a value of 1 and 0 if all three variables have values other than 1.

    --





  • 13.  RE: Using a Data List variable how to?

    Posted Fri October 13, 2023 02:37 PM
    If that doesn't work, I have another solution for you that is pretty elegant.

    --





  • 14.  RE: Using a Data List variable how to?

    Posted Fri October 13, 2023 03:17 PM

    Jon,

    You are correct, the ANY command returned only 52 cases vs the 5724 cases using the laborious 

    DO IF CHAR.SUBSTR (X,1,4) = code1 OR  CHAR.SUBSTR (X,1,4) = code2  [repeated 78 times].

    I would enjoy learning another solution.



    ------------------------------
    Paul Boserup
    ------------------------------



  • 15.  RE: Using a Data List variable how to?

    Posted Fri October 13, 2023 04:33 PM
    Ok.  Here is a very neat solution..  It uses the SPSSINC TRANS extension command.  You  might already have this installed, but this usage needs the newest version of the command, which has not yet made its way onto the Extension Hub.

    So download the SPSSINC_TRANS.spe file from here.
    Install it via Extensions > Install local extension bundle.
    I assume you are testing the whole string, but if not use ALTER TYPE to shorten all the variables in a single command.

     The command would as below
     and the list of codes would be the codes to match.  But the whole formula has to be in quotes, so you would need to use the standard SPSS quoting rules with line breaks to avoid exceeding the line length limit in Statistics like this.

    It's hard to get this properly copied into the email, so if it messes up, I can send you a file instead if you send me a direct email (jkpeck@gmail.com)

    spssinc trans result= CM_ANEMIA
     /variables I10_DX1 to I10_DX40
    /formula "len(set([<>]) & set([ 'D62','D460','D461','D464','D500','D500','D501',"+
     "'D508','D509','D510','D511','D512','D513','D518','D519','D520','D521','D528',"+
     "'D529','D530','D531','D532','D538','D539','D550','D551','D553','D558','D559',"+
     "'D588','D589','D590','D592','D593','D594','D595','D596','D598','D599','D600',"+
     "'D601','D608','D609','D611','D612','D613','D619','D630','D631','D638','D640',"+
     "'D641','D642','D643','D644','D649','D4620','D4621','D4622','D5521','D5529',"+
     "'D5910','D5911','D5912','D5913','D5919','D6101','D6109','D6182','D6189',"+
     "'D6481','D6489','D61810','D61811','D61818''])) > 0".

    To explain this a little,  your variable list goes in the variables subcommand,
    which supports the use of TO.  In the formula, <> is replaced with the expanded variable list.

    Then for each case the formula constructs a set containing the values of the variables
    and another set containing the codes to match, and it takes the intersection
    of the two sets.  If the resulting size is greater than zero, the code is found.

    I see, though, that some of the codes are longer than four characters.  You can use ALTER TYPE to truncate all the strings to 4 characters as a single command and shorten the codes
    in the list accordingly assuming that four bytes is sufficient.




    --





  • 16.  RE: Using a Data List variable how to?

    Posted Fri October 13, 2023 05:59 PM
    For some reason, that solution is using way too much memory.  Use this instead.
    You run the begin program block just once.
    The spssinc trans command could be run multiple times with different variables as needed.

    begin program python3.
    codes = ['D62','D460','D461','D464','D500','D500','D501',
     'D508','D509','D510','D511','D512','D513','D518','D519','D520','D521','D528',
     'D529','D530','D531','D532','D538','D539','D550','D551','D553','D558','D559',
     'D588','D589','D590','D592','D593','D594','D595','D596','D598','D599','D600',
     'D601','D608','D609','D611','D612','D613','D619','D630','D631','D638','D640',
     'D641','D642','D643','D644','D649','D4620','D4621','D4622','D5521','D5529',
     'D5910','D5911','D5912','D5913','D5919','D6101','D6109','D6182','D6189',
     'D6481','D6489','D61810','D61811','D61818']
     
    def has(*variables):
        return any((v in codes for v in variables))
    end program.

    spssinc trans result=CM_ANEMIA
        /variables I10_DX1 to I10_DX40
        /formula "has(<>)".


    --





  • 17.  RE: Using a Data List variable how to?

    Posted Sun October 15, 2023 10:52 AM

    After some investigation, I found that the solution I posted using sets doesn't work due to a bug in the programmability interface with long strings.

    Here, though, is a solution that avoids that problem and, assuming that the test variables are just blank after the codes, so it doesn't require modifying the input variables.  I n oticed that a lot of the codes are more than four characters.

    * Encoding: UTF-8.
    begin program python3.
    codes = set(['D62','D460','D461','D464','D500','D500','D501',
     'D508','D509','D510','D511','D512','D513','D518','D519','D520','D521','D528',
     'D529','D530','D531','D532','D538','D539','D550','D551','D553','D558','D559',
     'D588','D589','D590','D592','D593','D594','D595','D596','D598','D599','D600',
     'D601','D608','D609','D611','D612','D613','D619','D630','D631','D638','D640',
     'D641','D642','D643','D644','D649','D4620','D4621','D4622','D5521','D5529',
     'D5910','D5911','D5912','D5913','D5919','D6101','D6109','D6182','D6189',
     'D6481','D6489','D61810','D61811','D61818'])
     
    def has(*variables):
        return any((v.rstrip() in codes for v in variables))
    end program.

    spssinc trans result=CM_ANEMIA
        /variables I10_DX1 to I10_DX40
        /formula "has(<>)".



    ------------------------------
    Jon Peck
    ------------------------------