SPSS Statistics

Your hub for statistical analysis, data management, and data documentation. Connect, learn, and share with your peers!

View Only

Back to discussions

Expand all | Collapse all

Remove duplicate records if date difference between duplicate records is less than 30 days

1. Remove duplicate records if date difference between duplicate records is less than 30 days

Like
S B
Posted Fri December 09, 2022 09:09 AM
Edited by System Admin Fri January 20, 2023 04:19 PM

Reply
I have a dataset of 6300 cases of ED visits. Each patient can be in the dataset with multiple ED visits. I would like to delete duplicate records if a patient has visited the ED multiple times and the date difference is less then 30 days (so within 30 days visits again, deleting the latter).

I have a table with primary cases and duplicate cases based on patient ID. For each case I have a date for the ED visit. I want to write the syntax so that if a patient has a second visit within 30 days the case is removed. If there is no next date or the next date is >30 days, keep.

So for example:

Patient Date
123 01/11/18

123 15/11/18

123 25/11/18

123 05/12/18

123 10/01/19

456 20/11/18

Want outcome:

Patient Date

123 05/12/18

123 10/01/19

456 20/11/18

I'm on version 26.

Thank you!!

------------------------------
S B
------------------------------
#SPSSStatistics
2. RE: Remove duplicate records if date difference between duplicate records is less than 30 days

Like
Rick Marcantonio
Posted Fri December 09, 2022 10:05 AM

Reply
Check the DATEDIFF function. From the syntax manual (which you can find online):

------------------------------
Rick Marcantonio
Quality Assurance
IBM
------------------------------

Original Message
3. RE: Remove duplicate records if date difference between duplicate records is less than 30 days

Like
S B
Posted Fri December 09, 2022 11:24 AM

Reply
Hi Rick, thanks for your reply.

The problem is that I need to calculate the date difference between variables across different rows, not different columns. So for example, if a patient has come to the ED 5 times, they will have 5 rows and for each row in the column date there will be a ED visit date. I would like to calculate between these rows how many days are in between the visits. Is this possible?

Thanks!

------------------------------
S B
------------------------------

Original Message
4. RE: Remove duplicate records if date difference between duplicate records is less than 30 days

Like
Frank Watzl
Posted Fri December 09, 2022 11:36 AM

Reply
The CREATE command with LEAD and/or LAG would be useful for cross row comparison.
The following may not be the most concise way and it may not match your use case in every detail, but you might get the idea.

SORT CASES BY date (A).
SORT CASES BY Patient (A).
* Flag last row of every patient.
CREATE next_patient = LEAD(Patient, 1).
IF next_patient NE Patient is_last_visit = 1.
EXECUTE.
* Next visit of same patient, if any.
CREATE next_date = LEAD (Date, 1).
EXECUTE.
DO IF is_last_visit EQ 1.
COMPUTE next_date = -1.
END IF.
EXECUTE.
* Days until next visit .
DO IF next_date GT 0.
COMPUTE days_til_next_date = DATEDIFF(next_date,Date, "days").
END IF.

* flag cases.
DO IF (is_last_visit | days_til_next_date GT 30).
COMPUTE keep = 1.
END IF.
EXECUTE.

Original Message
5. RE: Remove duplicate records if date difference between duplicate records is less than 30 days

Like
S B
Posted Wed December 14, 2022 08:11 AM

Reply
Thank you for your advice! I have used the lag function which has worked very well and I've managed to delete any cases of patients visiting multiple times <30 days.

*Identify first row for each patient with multiple visits, create column counter which has a value of 1 for each patients first visit.
IF $casenum = 1 OR Patient_ID NE lag(Patient_ID) counter = 1.
EXECUTE.

*Calculate days between multiple visits by the same patient.
IF Patient_ID = lag(Patient_ID) Days_between_visits = datediff(Date_visit_ED,lag(Date_visit_ED),'days').
EXECUTE.

*Delete any rows with cases <30 days, by selecting for any visits where datediff is >30 days or value is missing (as missing indicates patient only has one visit).
SELECT IF ((Days_between_visits>30) OR (MISSING(Days_between_visits))).
EXECUTE.

To double check that the lag function selected the correct cases, I identified duplicates, defining matching cases based on patient ID and sorting within matching groups by date of ED visit.

Rick and Jon, I tried Casestovars earlier but gave too many variables per ID, but thank you for the suggestion.

------------------------------
S B
------------------------------

Original Message
6. RE: Remove duplicate records if date difference between duplicate records is less than 30 days

Like
Rick Marcantonio
Posted Fri December 09, 2022 11:52 AM

Reply
It is possible, perhaps with CASESTOVARS or AGGREGATE. I'm traveling currently but will take a look at this when I am somewhere I can do that.

------------------------------
Rick Marcantonio
Quality Assurance
IBM
------------------------------

Original Message
7. RE: Remove duplicate records if date difference between duplicate records is less than 30 days

Like
Jon Peck

IBM Champion
Posted Fri December 09, 2022 11:58 AM

Reply
You might also consider CASESTOVARS to group all the cases for a given ID into one record, but that might produce an unmanageable number of variables..

--
Jon K Peck
jkpeck@gmail.com

Original Message

SPSS Statistics

SPSS Statistics

Remove duplicate records if date difference between duplicate records is less than 30 days

S BFri December 09, 2022 09:09 AM

Rick MarcantonioFri December 09, 2022 10:05 AM

S BFri December 09, 2022 11:24 AM

Frank WatzlFri December 09, 2022 11:36 AM

S BWed December 14, 2022 08:11 AM

Rick MarcantonioFri December 09, 2022 11:52 AM

Jon PeckFri December 09, 2022 11:58 AM

1. Remove duplicate records if date difference between duplicate records is less than 30 days

2. RE: Remove duplicate records if date difference between duplicate records is less than 30 days

3. RE: Remove duplicate records if date difference between duplicate records is less than 30 days

4. RE: Remove duplicate records if date difference between duplicate records is less than 30 days

5. RE: Remove duplicate records if date difference between duplicate records is less than 30 days

6. RE: Remove duplicate records if date difference between duplicate records is less than 30 days

7. RE: Remove duplicate records if date difference between duplicate records is less than 30 days

Additional
Resources

Office

Quick Links

SPSS Statistics

SPSS Statistics

Remove duplicate records if date difference between duplicate records is less than 30 days

S BFri December 09, 2022 09:09 AM

Rick MarcantonioFri December 09, 2022 10:05 AM

S BFri December 09, 2022 11:24 AM

Frank WatzlFri December 09, 2022 11:36 AM

S BWed December 14, 2022 08:11 AM

Rick MarcantonioFri December 09, 2022 11:52 AM

Jon PeckFri December 09, 2022 11:58 AM

1. Remove duplicate records if date difference between duplicate records is less than 30 days

2. RE: Remove duplicate records if date difference between duplicate records is less than 30 days

3. RE: Remove duplicate records if date difference between duplicate records is less than 30 days

4. RE: Remove duplicate records if date difference between duplicate records is less than 30 days

5. RE: Remove duplicate records if date difference between duplicate records is less than 30 days

6. RE: Remove duplicate records if date difference between duplicate records is less than 30 days

7. RE: Remove duplicate records if date difference between duplicate records is less than 30 days

Related Content

casestovars usage

Using sequential case processing for data management in SPSS

What's New in SPSS Statistics 29

FAQs for SPSS Statistics Subscription

SPSS Statistics SDK (Version 30)

Additional Resources

Office

Quick Links

Additional
Resources