SPSS Statistics

 View Only

SPSS Statistics data files can be smaller and faster if saved as uncompressed

By Archive User posted Wed September 30, 2015 08:37 PM

  
When you save a data file in SPSS Statistics as a .SAV file using the File->Save or File->Save As, the file is saved in compressed form. For .SAV files stored compressed, small integers (from −99 to 155) and system missing values are stored in one byte instead of the eight bytes that are used in an uncompressed file. However this comes at a cost. Instead of storing numbers that don’t fit in this definition in eight bytes, they now require nine bytes (one byte to say it isn’t a small integer, and eight bytes to store the number), a 12.5% increase in size. File I/O additionally takes about 12.5% longer to read and write numbers outside of this small integer range.

If your data consist of primarily numeric data outside of this small integer range, you could see a reduction of both file size and data read/write of about 12.5% if you save the data as uncompressed. You will likely not notice a speed improvement unless you have a very large dataset (millions of rows and/or hundreds of variables), but the file size will be noticeably different.

This can’t be done from the GUI, but if you paste the syntax from your Save File dialog, it will paste something like this:

SAVE OUTFILE=''
/COMPRESSED.

Simply change “/COMPRESED” to “/UNCOMPRESSED” to save it as uncompressed.

SAVE OUTFILE=''
/UNCOMPRESSED.

There is no difference in the appearance of the .SAV file – it will still be stored with a .sav extension, and all versions of Statistics can read both compressed and uncompressed files without trouble.

Note that saving data with strings in it will not be affected by the compression unless your data consist of many strings containing only blanks. In that case, saving compressed will save significant amount of space and time.



#DataManagement
#SPSSStatistics
1 comment
0 views

Permalink

Comments

Mon October 05, 2015 03:10 PM

An alternative is to use zsav format. This may increase the compression significantly, but it may or may not improve processings speed.