SPSS Statistics

 View Only
  • 1.  Cross-product between 2 datasets

    Posted Wed August 23, 2023 06:24 AM

    Hello, dear community!

     

    I have a tricky question: I have a list of clients in one data set. Each line has a single unique client. On the other data set, I have a date range variable of August  (e.g. 01-Aug-2023 to 31-Aug-2023).

     

    Consider the following syntax:

     

    DATASET DECLARE cust.

    DATA LIST FREE / cust_name (A30).

    BEGIN DATA

        "AROMA"

        "SHUPERSAL"

        "RAMI LEVI"

        "SHEKER"

    END DATA.

    DATASET NAME CUST.

     

    DATASET DECLARE DATES.

    INPUT PROGRAM.

    LOOP id=1 TO 31.

    -  END CASE.

    END LOOP.

    END FILE.

    END INPUT PROGRAM.

    LIST.

     

    DATASET NAME DATES.

    DATASET ACTIVATE DATES.

     

    COMPUTE test_date=DATE.MDY(8, 1, 2023).

    EXECUTE.

     

    IF (id> lag(id)) test_date = lag( test_date) +86400.

    EXECUTE.

     

    ALTER TYPE test_date (DATE11).

     

    I would like to merge these 2 datasets into one cross-product dataset which has each client in all ranges of sequential dates in August.

     

    Set 1

     

    Client_var

    Client1

    Client 2

     

    Set 2

     

    Date_var

    1.8.23

    2.8.23

    .

    .

    .

    31.8.23

     

    Into:

     

    Cross product set

    Client_var Date_var

    Client1 1.8.23

    Client1 1.8.23

    .

    .

    .

    Client1 31.8.23

    Client2 1.8.23

    Client2 2.8.23

    .

    .

    .

    Client2 31.8.23

    Client3 1.8.23

    .

    .

     

    It seems that MATCH FILES does not have cross-product capabilities.

     

    Any idea how to cross those 2 sets?

     

    Thank!

     

     

     

     

             

    Meni Berger |

    Data Scientist and Head of Tech  Support

    Email  -  Meni@genius.co.il

    11 Menachem Begin st.,  Ramat Gan

    www.genius.co.il

    Click here to open a support ticket  

    Title: LinkedIn - Description: image of LinkedIn icon

     

     



  • 2.  RE: Cross-product between 2 datasets
    Best Answer

    IBM Champion
    Posted Wed August 23, 2023 04:48 PM
    I haven't thought this all through, but maybe the STATS CARTPROD extension command available via the Extension Hub might help.

    From the help...

    Compute the Cartesian Product of Two Sets of Variables

    The Cartesian product of two variables, say, a and b, is the collection of all combinations of the values of a with the values of b. If a has values 1, 1, 2 and b has values 'x', 'y', the Cartesian product is (1,'x'), (1,'y'), (1, 'x'), (1, 'y'), (2,'x'), (2,'y'). In SQL, this is known as the CROSS JOIN.

    This command takes two sets of input variables in the same or different datasets and produces a new data file containing the Cartesian product of those variables.

    • Each case in either dataset contributes to the result. Duplicate values are not eliminated.

    The command either operates on the active dataset plus one other dataset or takes all variables from the active dataset. The first dataset is referred to as the left dataset, and the second as the right dataset.

    To run this procedure, from the menus choose:

     Data
      Data>Cartesian Product

    From that you could select down to the desired cases.  If these data sets are large, though, this might not be practical.  With 1000 cases in each dataset, the result would have one million cases.  That's not a big number for SPSS, but with 10000 cases in each file, you are looking at one hundred million cases.  Still possible, but  ...






  • 3.  RE: Cross-product between 2 datasets

    Posted Thu August 24, 2023 03:17 AM
    Edited by Meni Berger Thu August 24, 2023 03:16 AM

    Thank you Jon. this extension works like magic! someone should add this function as a native Statistics capability.



    ------------------------------
    Meni Berger
    ------------------------------



  • 4.  RE: Cross-product between 2 datasets

    IBM Champion
    Posted Thu August 24, 2023 09:06 AM
    It's actually implemented using regular SPSS commands - LOOP, XSAVE, MATCH FILES wrapped in some Python logic.

    --





  • 5.  RE: Cross-product between 2 datasets

    Posted Thu August 24, 2023 04:48 PM

    I think something like this should work:

     

    data list

     / name (a8).

    begin data.

    Name A

    Name B

    Name C

    end data.

    list var = all.

     

    loop day = 1 to 31.

    xsave outfile = "c:\temp\test.sav".

    end loop.

    execute.

    get file = "c:\temp\test.sav".

    list var = all.

     

    You then do a table merge between this file and the one with the dates. There are variations on this depending on whether you know the exact dates or not, and whether you are merging based on days.

     

    I hope this helps.



    This e-mail and any files transmitted with it may contain privileged or confidential information. It is solely for use by the individual for whom it is intended, even if addressed incorrectly. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.


    Thank you for your compliance.







  • 6.  RE: Cross-product between 2 datasets

    Posted Sun August 27, 2023 08:10 AM

    The table merge between this file will not produce a Cross-product between 2 datasets.



    ------------------------------
    Meni Berger
    ------------------------------



  • 7.  RE: Cross-product between 2 datasets

    Posted Mon August 28, 2023 07:33 AM

    OK! I get it. you loop a count index directly into "name" and then force merge using the loop count as the key for merge! that's a nice one. very compact. thanks for the idea!



    ------------------------------
    Meni Berger
    ------------------------------