I don't understand the logic here.
- The EXECUTE command is of no use. It just forces a data pass that would have taken place anyway, which wastes time.
- Why the SORT commands? They don't affect the SELECTs, and since the sort order is not affected by the other code, there is no reason to keep sorting by the same variable, anyway.
- There is no need for the RECODE. You could just use ~ MISSING(GazepointX) etc.
- All those SELECT commands could be combined into one single SELECT IF. If you didn't have those SORTs in between, it would actually combine all those into just one data pass.
- The logic seems to be to select cases that are complete on all the variables, so you could just do
SELECT IF NMISSING(GazepointX, GazepointY, ...) EQ 0.
--
Original Message:
Sent: 1/10/2023 9:32:00 PM
From: becky allred
Subject: RE: delete thousands of rows from very large dataset
Thank you sooo much. You've saved me months of work.
This was the final syntax I used:
RECODE GazepointX GazepointY GazepointleftX GazepointleftY GazepointrightX GazepointrightY
(SYSMIS=9999).
EXECUTE. *I suppose I could have avoided this by using "." instead of 9999 below.
SELECT IF (GazepointX) ~= 9999.
SORT CASES BY Eyetrackertimestamp(A).
SELECT IF (GazepointY) ~= 9999.
SORT CASES BY Eyetrackertimestamp(A).
SELECT IF (GazepointleftX) ~= 9999.
SORT CASES BY Eyetrackertimestamp(A).
SELECT IF (GazepointleftY) ~= 9999.
SORT CASES BY Eyetrackertimestamp(A).
SELECT IF (GazepointrightX) ~= 9999.
SORT CASES BY Eyetrackertimestamp(A).
SELECT IF (GazepointrightY) ~= 9999.
SORT CASES BY Eyetrackertimestamp(A).
Original Message:
Sent: 1/10/2023 7:10:00 PM
From: Jon Peck
Subject: RE: delete thousands of rows from very large dataset
The rows will be deleted if you use SELECT IF, but this won't happen until the next data pass, which could be triggered by a SAVE command or a procedure.
Don't confuse SELECT IF with FILTER as the latter just, well, filters, so that procedures don't see them.
--
Original Message:
Sent: 1/10/2023 6:29:00 PM
From: becky allred
Subject: RE: delete thousands of rows from very large dataset
Thank you for responding, Jon.
I've been thinking that this wouldn't work because I tried and it appears that the rows that are not selected remain in the data file. They are not deleted. I really need to delete the rows so that the files are smaller and then the 296 files can be merged.
I've considered a split file command, but that apparently only works with categorical variables. I may be overlooking something, though. I welcome more ideas!
Original Message:
Sent: 1/10/2023 5:13:00 PM
From: Jon Peck
Subject: RE: delete thousands of rows from very large dataset
This sounds like a typical SELECT IF command. Something like
SELECT IF NMISSING(x1 to x5) <= 5.
where x1 to x5 would be replaced by an explicit variable list if they are not consecutive in the file.
--
Original Message:
Sent: 1/10/2023 4:11:00 PM
From: becky allred
Subject: delete thousands of rows from very large dataset
I am using v28 of SPSS. I have 296 data files that I am trying to clean before merging into one dataset. Each file has about 300,000 rows of data, but I only need the rows that have values in at least one of 6 different variables (about 20% of the rows). I am trying to determine if any members of this community have ever had to *delete* (not simply select out) that many rows of data in this many data files. If so, menu steps or sample code would be much appreciated. Thanks!
------------------------------
becky allred
------------------------------
#SPSSStatistics