SPSS Statistics

Your hub for statistical analysis, data management, and data documentation. Connect, learn, and share with your peers!

View Only

Back to discussions

Expand all | Collapse all

Possible Error with SPSS's Excel Writer

1. Possible Error with SPSS's Excel Writer

Like
Dominic Ricottone
Posted Fri January 05, 2024 03:26 PM

Reply
A string value containing "_x0001_", when exported into an Excel file from SPSS, is replaced with the control character "start of header" (a.k.a. SOH).

Actually this happens for any hexadecimal value (e.g. "_x0009_" becomes a tab, "_x0030_" becomes a zero). It's just comparatively more inconvenient with control characters, since they can register as corruption in some downstream consumers of Excel data files.

I investigated how Excel itself wants data like this to be stored. I created an Excel file directly with Excel and included "_x0001_" in a cell. I unzipped the resultant file and found that the value was actually stored as "_x005F_x0001_". In other words, the leading underscore was encoded as a hexadecimal value to prevent "_x0001_" from being read as a hexadecimal value. The file that SPSS wrote contains "_x0001_" exactly.

It seems like SPSS's Excel writer is not properly encoding data before writing. Would this qualify as a defect? Or if this is considered a feature, is this behavior documented anywhere?

Here's some syntax for reproducing the behavior. It creates a range of test data, exports to Excel, then re-importing it into SPSS. You'll also see the issue if you manually open the created Excel file.

* Create the test data. DATA LIST /testvar 1-50 (A). BEGIN DATA _x0000_ _x0001_ _x0002_ _x0003_ _x0004_ _x0005_ _x0006_ _x0007_ _x0008_ _x0009_ _x000A_ _x000B_ _x000C_ _x000D_ _x000E_ _x000F_ _x0030_ also works with leading _x0030_ or trailing data END DATA. DATASET NAME DataSet1 WINDOW=FRONT. * Export to Excel and re-import. DATASET ACTIVATE DataSet1 WINDOW=FRONT. SAVE TRANSLATE OUTFILE='exported.xlsx' /TYPE=XLS /VERSION=12 /FIELDNAMES /CELLS=VALUES /REPLACE. GET DATA /TYPE=XLSX /FILE='exported.xlsx' /CELLRANGE=FULL /READNAMES=ON. EXECUTE. DATASET NAME DataSet2 WINDOW=FRONT. * See the Data View and how values were mutated. (Note that most control characters display as an empty box.) * Alternatively, export to a text file for easier inspection of the byte data. DATASET ACTIVATE DataSet2 WINDOW=FRONT. SAVE TRANSLATE OUTFILE='exported.csv' /TYPE=CSV /MAP /REPLACE /FIELDNAMES /CELLS=VALUES.

I've reproduced this behavior in SPSS Statistics 27.0.1.0 and SPSS Statistics 20.0.0. Obviously the latter is out of support, but it seems like this is a long-standing issue with the writer.

And incidentally, why is "SPSS" not an option for the languages in "Insert/edit code sample"?

------------------------------
Dominic Ricottone
------------------------------
2. RE: Possible Error with SPSS's Excel Writer

Like
Jon Peck

IBM Champion
Posted Sat January 06, 2024 03:41 PM

Reply
SPSS does not have a specific hex datatype, so I would expect it to export such values literally. Excel does have hex to decimal and decimal to hex conversion functions, though.

Note that 0x0005F is just the ascii code for "_".

I am not familiar with all the nuances of the Excel sheet schema, but if I look at the xml for a value like _x0001 entered directly in Excel, it appears to just have the value 1..

<c r="A3" t="s">
<v>1</v>
</c>

--
Jon K Peck
jkpeck@gmail.com

Original Message
3. RE: Possible Error with SPSS's Excel Writer

Like
Dominic Ricottone
Posted Mon January 08, 2024 11:33 AM

Reply
Hi Jon,

I expect that, if we use different versions of Excel, we will get slightly different results from a manual file creation. I have some more notes on this below. But I am interested in knowing: Were you able to make SPSS produce an Excel file with un-mutated data?

Admittedly a bit off topic for an SPSS forum, but here's my (limited) understanding of how modern Excel works:

I believe that the "1" you see is an index in the shared strings table. You'll need to look in "xl/sharedStrings.xml" instead of "xl/worksheets/sheet1.xml" for the actual data. Any cell (c tag) in a spreadsheet that has the t="s" attribute actually stores an index to an external shared strings table. This is a space saving strategy, because duplicate cells can reference to the same actual data on the shared strings table.

To be perfectly explicit, if I manually create an Excel file as described above, I get a "xl/worksheets/sheet1.xml" that looks like:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac xr xr2 xr3" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3" xr:uid="{00000000-0001-0000-0000-000000000000}"> <dimension ref="A1:A4"/> <sheetViews> <sheetView tabSelected="1" workbookViewId="0"> <selection activeCell="A5" sqref="A5"/> </sheetView> </sheetViews> <sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/> <sheetData> <row r="1" spans="1:1" x14ac:dyDescent="0.25"> <c r="A1" t="s"> <v>0</v> </c> </row> </sheetData> <pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/> </worksheet>

(I ran the raw XML through https://codebeautify.org/xml-pretty-print to make it easier to read.) (Not a sponsorship, other prettify-ers are available.)

The "xl/sharedStrings.xml" file then contains:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1"> <si> <t>_x005F_x0001_</t> </si> </sst>

If I repeat this process but instead write "leading data leading data_x0001_ trailing data trailing data" into a cell, the only change is that the "xl/sharedStrings.xml" file then contains:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1"> <si> <t>leading data leading data_x005F_x0001_ trailing data trailing data</t> </si> </sst>

------------------------------
Dominic Ricottone
------------------------------

Original Message
4. RE: Possible Error with SPSS's Excel Writer

Like
Jon Peck

IBM Champion
Posted Mon January 08, 2024 01:07 PM

Reply
I see that with SPSS Statistics v29, the file still exports wrong. I don't know how that code works, though. If I export using the Database Wizard, it does export correctly in either xls or xlsx format. The DBW uses ODBC, so it would use Microsoft's own driver rather than the direct Excel export which is based on what was known about the Excel file format. I suspect that this problem didn't occur with xls format.

--
Jon K Peck
jkpeck@gmail.com

Original Message
5. RE: Possible Error with SPSS's Excel Writer

Like
Dominic Ricottone
Posted Mon January 08, 2024 12:13 PM
| view attached

Reply
I'm attaching an Excel file here. This is what SPSS 27.0.1.0 wrote on my machine with the syntax quoted above. If you open it directly or read it into SPSS, I expect that you will not see the original string data.

------------------------------
Dominic Ricottone
------------------------------

Attachment(s)

exported.xlsx 5 KB 1 version

Original Message

SPSS Statistics

SPSS Statistics

Possible Error with SPSS's Excel Writer

Dominic RicottoneFri January 05, 2024 03:26 PM

Jon PeckSat January 06, 2024 03:41 PM

Dominic RicottoneMon January 08, 2024 11:33 AM

Jon PeckMon January 08, 2024 01:07 PM

Dominic RicottoneMon January 08, 2024 12:13 PM

1. Possible Error with SPSS's Excel Writer

2. RE: Possible Error with SPSS's Excel Writer

3. RE: Possible Error with SPSS's Excel Writer

4. RE: Possible Error with SPSS's Excel Writer

5. RE: Possible Error with SPSS's Excel Writer

Additional
Resources

Office

Quick Links

SPSS Statistics

SPSS Statistics

Possible Error with SPSS's Excel Writer

Dominic RicottoneFri January 05, 2024 03:26 PM

Jon PeckSat January 06, 2024 03:41 PM

Dominic RicottoneMon January 08, 2024 11:33 AM

Jon PeckMon January 08, 2024 01:07 PM

Dominic RicottoneMon January 08, 2024 12:13 PM

1. Possible Error with SPSS's Excel Writer

2. RE: Possible Error with SPSS's Excel Writer

3. RE: Possible Error with SPSS's Excel Writer

4. RE: Possible Error with SPSS's Excel Writer

5. RE: Possible Error with SPSS's Excel Writer

Additional Resources

Office

Quick Links

Additional
Resources