SPSS Statistics

SPSS Statistics

Your hub for statistical analysis, data management, and data documentation. Connect, learn, and share with your peers! 

 View Only
  • 1.  Remove duplicate records if date difference between duplicate records is less than 30 days

    Posted Fri December 09, 2022 09:09 AM
    Edited by System Admin Fri January 20, 2023 04:19 PM
    I have a dataset of 6300 cases of ED visits. Each patient can be in the dataset with multiple ED visits. I would like to delete duplicate records if a patient has visited the ED multiple times and the date difference is less then 30 days (so within 30 days visits again, deleting the latter). 

    I have a table with primary cases and duplicate cases based on patient ID. For each case I have a date for the ED visit. I want to write the syntax so that if a patient has a second visit within 30 days the case is removed. If there is no next date or the next date is >30 days, keep. 

    So for example: 

    Patient   Date
    123        01/11/18

    123        15/11/18

    123        25/11/18

    123        05/12/18

    123        10/01/19

    456        20/11/18

     

    Want outcome:

    Patient   Date

    123      05/12/18

    123     10/01/19

    456     20/11/18

    I'm on version 26. 

    Thank you!! 



    ------------------------------
    S B
    ------------------------------
    #SPSSStatistics


  • 2.  RE: Remove duplicate records if date difference between duplicate records is less than 30 days

    Posted Fri December 09, 2022 10:05 AM
    Check the DATEDIFF function. From the syntax manual (which you can find online):



    ------------------------------
    Rick Marcantonio
    Quality Assurance
    IBM
    ------------------------------



  • 3.  RE: Remove duplicate records if date difference between duplicate records is less than 30 days

    Posted Fri December 09, 2022 11:24 AM
    Hi Rick, thanks for your reply.

    The problem is that I need to calculate the date difference between variables across different rows, not different columns. So for example, if a patient has come to the ED 5 times, they will have 5 rows and for each row in the column date there will be a ED visit date. I would like to calculate between these rows how many days are in between the visits. Is this possible?

    Thanks!

    ------------------------------
    S B
    ------------------------------



  • 4.  RE: Remove duplicate records if date difference between duplicate records is less than 30 days

    Posted Fri December 09, 2022 11:36 AM
    The CREATE command with LEAD and/or LAG would be useful for cross row comparison.
    The following may not be the most concise way and it may not match your use case in every detail, but you might get the idea.


    SORT CASES BY date (A).
    SORT CASES BY Patient (A).
    * Flag last row of every patient.
    CREATE next_patient = LEAD(Patient, 1).
    IF next_patient NE Patient is_last_visit = 1.
    EXECUTE.
    * Next visit of same patient, if any.
    CREATE next_date = LEAD (Date, 1).
    EXECUTE.
    DO IF is_last_visit EQ 1.
    COMPUTE next_date = -1.
    END IF.
    EXECUTE.
    * Days until next visit .
    DO IF next_date GT 0.
    COMPUTE days_til_next_date = DATEDIFF(next_date,Date, "days").
    END IF.

    * flag cases.
    DO IF (is_last_visit | days_til_next_date GT 30).
    COMPUTE keep = 1.
    END IF.
    EXECUTE.




  • 5.  RE: Remove duplicate records if date difference between duplicate records is less than 30 days

    Posted Wed December 14, 2022 08:11 AM
    Thank you for your advice! I have used the lag function which has worked very well and I've managed to delete any cases of patients visiting multiple times <30 days. 

    *Identify first row for each patient with multiple visits, create column counter which has a value of 1 for each patients first visit.
    IF $casenum = 1 OR Patient_ID NE lag(Patient_ID) counter = 1.
    EXECUTE.

    *Calculate days between multiple visits by the same patient.
    IF Patient_ID = lag(Patient_ID) Days_between_visits = datediff(Date_visit_ED,lag(Date_visit_ED),'days').
    EXECUTE.

    *Delete any rows with cases <30 days, by selecting for any visits where datediff is >30 days or value is missing (as missing indicates patient only has one visit).
    SELECT IF ((Days_between_visits>30) OR (MISSING(Days_between_visits))).
    EXECUTE.

    To double check that the lag function selected the correct cases, I identified duplicates, defining matching cases based on patient ID and sorting within matching groups by date of ED visit. 



    Rick and Jon, I tried Casestovars earlier but gave too many variables per ID, but thank you for the suggestion. ​

    ------------------------------
    S B
    ------------------------------



  • 6.  RE: Remove duplicate records if date difference between duplicate records is less than 30 days

    Posted Fri December 09, 2022 11:52 AM
    It is possible, perhaps with CASESTOVARS or AGGREGATE. I'm traveling currently but will take a look at this when I am somewhere I can do that.

    ------------------------------
    Rick Marcantonio
    Quality Assurance
    IBM
    ------------------------------



  • 7.  RE: Remove duplicate records if date difference between duplicate records is less than 30 days

    Posted Fri December 09, 2022 11:58 AM
    You might also consider CASESTOVARS to group all the cases for a given ID into one record, but that  might produce an unmanageable number of variables..

    --