SPSS Statistics

 View Only

Time-saving tip when analyzing data with many unused variables

By Rick Marcantonio posted Thu October 29, 2020 06:22 PM


Question: I have to do an analysis using BOOTSTRAP and I’ve tried but it’s taking forever! Is there any way to speed this process up?

 Answer: Yes, possibly. Keep in mind that when most procedures run, no matter how many variables are being used, they have to pass ALL the data: Every case and every variable. The more rows and columns used, the more data there is to write and the longer it takes. Many times, we need all the cases; our analysis requires them. However, chances are that we don’t need all the variables, perhaps just a subset of them. If we can reduce the number of columns to go through, any given analysis will run that much quicker. Sometimes, the difference is dramatic.

To see this on your own computer, try a simple experiment. Download this syntax file, open SPSS Statistics, and run it: boot_test.sps

You can compare the times by selecting the Notes table for each REGRESSION in the Tree View of the Output Window (the left-most pane), double clicking it, and finding the “Elapsed Time” entry. On my computer, the first analysis ran in 45 seconds; the second one (the reduced dataset) ran in 6 seconds. If we change the BOOTSTRAP samples back to the default (1000), the times would be higher by roughly a factor of 10. Again, on my machine the “full dataset” REGRESSION with BOOTSTRAP(1000) took 7 minutes and 35 seconds. The reduced analysis, identical except for the elimination of unused variables, took only 51 seconds – a speed-up of almost 90%!

This is not just an academic exercise. We have recently worked with users who needed to bootstrap a linear regression on a huge dataset and got so frustrated that they nearly gave up. Their dataset was larger than our experimental one above and the bootstrap was taking so long that they simply assumed the software had stopped working and nothing could be done. Once we showed them how to reduce the dataset to just the variables they were interested in, the bootstrap completed in about a minute. 

This is what we recommended to them:

First, open the original dataset.

Next, save it to a new file. In the menu system, select File > Save As…

In that dialog, select Variables…

Select Drop All.
Now, find and select just the variables that are required and press Continue.

Now, save the data to a new file (perhaps in a temporary directory where you can delete it later if you need or want to).

Finally, do the analysis on the reduced dataset.

Of course, some analyses are just going to take time; eliminating variables is not universal remedy. Nevertheless, for cases like this and probably many others, time can be saved by creating a temporary dataset that contains only the cases and variables that you want to study. I encourage you to give it a try.