Hi Elizabeth,
Here are the details for my system -
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): IBM Z (s390x arch), Ubuntu 18.04.04
- TensorFlow installed from (source or binary): Installed Tensorflow C lib from source
- TensorFlow version (use command below): 1.15.3
- Bazel version (if compiling from source): 0.26.1
The current behavior
The build instructions available at https://github.com/linux-on-ibm-z/scripts/blob/master/Tensorflow/1.15.0/build_tensorflow.sh are only for installing the Tensorflow for Python. I wanted to install only the Tensorflow C library on zLinux and perform inferencing using that so I compiled the TensorFlow C library from source using Bazel. Moreover, I added the patch on Bazel as mentioned here [Build Instruction for Tensorflow] before using it to build Tensorflow.
After the successful installation, I tried to do inferencing using a pre-trained model, that's when I encountered the following error. Note that, I'm loading the pretrained model graph from a protobuf file.
INFO[0000]/root/gopros/src/github.com/rai-project/tensorflow/vendor/github.com/rai-project/dlframework/framework/register.go:76 github.com/rai-project/tensorflow/vendor/github.com/rai-project/dlframework/framework.Register() skipping regitration of hidden model pkg=dlframework/framework name=SSD_MobileNet_v2_Quantized_300x300_COCO
INFO[0000] running predict urls model=MobileNet_v1_1.0_224 pkg=dlframework/framework/cmd/server
Error: unable to create tensorflow model graph: Invalid value in tensor used for shape: -385679360
Standalone code to reproduce the issue
I suspected the error is because the protobuf file for the model graph was originally created on little-endian machine but I'm trying to load it on a big-endian machine. In order to confirm that the error is due to the loading of the model graph from protobuf file in Tensorflow, I tried out [these instructions]. Using them, I created a TensorFlow model and saved it in protobuf format on an x86 system. After that, when I loaded the graph file on an x86 system, it worked but when I tried to load the model graph on s390x system, I got the following error -
Loading graph
Read GraphDef of 27083 bytes
ERROR: Dimension 0 in both shapes must be equal, but are 1 and 16777216. Shapes are [1,1] and [16777216,16777216]. for 'dense/kernel/Assign' (op: 'Assign') with input shapes: [1,1], [16777216,16777216].
I think, this shows that the error is due to endian difference of the architecture in which the protobuf file was created and on which it was loaded.
Other info / logs
The following tests are passing on s390x machine -
//tensorflow/c:c_api_test
//tensorflow/c:c_api_function_test
//tensorflow/c:c_test
//tensorflow/c:ops_test
//tensorflow/c:env_test
//tensorflow/c:c_test_util
//tensorflow/cc/saved_model:reader_test
//tensorflow/cc/saved_model:loader_test
------------------------------
Priyanshu Khandelwal
------------------------------
Original Message:
Sent: Thu July 23, 2020 11:02 AM
From: Elizabeth K. Joseph
Subject: Endian issue in Tensorflow protobuf graphs
Hi Priyanshu,
What distribution and build instructions are you using for Tensorflow? If you're using the build instructions for Tensorflow on Ubuntu (the only distro in the verified software list) I can follow up with the team internally who manages this to see if they've run into this issue.
In the meantime I suggest visiting these older build instructions which cover 1.15 in case there was anything you missed:
https://github.com/linux-on-ibm-z/docs/wiki/Building-TensorFlow/0baa4f424a6013386d6de2b3bf8a03506c1d9925
------------------------------
Elizabeth K. Joseph
Original Message:
Sent: Wed July 22, 2020 03:25 PM
From: Priyanshu Khandelwal
Subject: Endian issue in Tensorflow protobuf graphs
I'm trying to perform inferencing on IBM Z using Tensorflow 1.15.3. I have installed TensorFlow on the system successfully but when I try to load any graph from a protobuf file, it fails with the following error -
Error: unable to create tensorflow model graph: Invalid value in tensor used for shape: -385679360
I thnk the error is due to the loading graph from a protobuf file which was created on a little-endian system and is now being accessing it in a big-endian system.
Please note that I cannot use the latest version of TensorFlow because my project is built on TensorFlow 1.14 and the API for 2.x has some breaking changes. Is there any fix for this? Thanks.
------------------------------
Priyanshu Khandelwal
------------------------------