IBM Z and LinuxONE IBM Z

IBM Z

The enterprise platform for mission-critical applications brings next-level data privacy, security, and resiliency to your hybrid multicloud.

 View Only
  • 1.  Endian issue in Tensorflow protobuf graphs

    Posted Thu July 23, 2020 09:52 AM

    I'm trying to perform inferencing on IBM Z using Tensorflow 1.15.3. I have installed TensorFlow on the system successfully but when I try to load any graph from a protobuf file, it fails with the following error -

    Error: unable to create tensorflow model graph: Invalid value in tensor used for shape: -385679360

    I thnk the error is due to the loading graph from a protobuf file which was created on a little-endian system and is now being accessing it in a big-endian system.

    Please note that I cannot use the latest version of TensorFlow because my project is built on TensorFlow 1.14 and the API for 2.x has some breaking changes. Is there any fix for this? Thanks.



    ------------------------------
    Priyanshu Khandelwal
    ------------------------------


  • 2.  RE: Endian issue in Tensorflow protobuf graphs

    Posted Thu July 23, 2020 11:03 AM
    Edited by Elizabeth K. Joseph Thu July 23, 2020 11:03 AM

    Hi Priyanshu,

    What distribution and build instructions are you using for Tensorflow? If you're using the build instructions for Tensorflow on Ubuntu (the only distro in the verified software list) I can follow up with the team internally who manages this to see if they've run into this issue.

    In the meantime I suggest visiting these older build instructions which cover 1.15 in case there was anything you missed:

    https://github.com/linux-on-ibm-z/docs/wiki/Building-TensorFlow/0baa4f424a6013386d6de2b3bf8a03506c1d9925



    ------------------------------
    Elizabeth K. Joseph
    ------------------------------



  • 3.  RE: Endian issue in Tensorflow protobuf graphs

    Posted Thu July 23, 2020 12:50 PM

    Hi Elizabeth,

    Here are the details for my system -

    System information

    • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): IBM Z (s390x arch), Ubuntu 18.04.04
    • TensorFlow installed from (source or binary): Installed Tensorflow C lib from source
    • TensorFlow version (use command below): 1.15.3
    • Bazel version (if compiling from source): 0.26.1

    The current behavior

    The build instructions available at https://github.com/linux-on-ibm-z/scripts/blob/master/Tensorflow/1.15.0/build_tensorflow.sh are only for installing the Tensorflow for Python. I wanted to install only the Tensorflow C library on zLinux and perform inferencing using that so I compiled the TensorFlow C library from source using Bazel. Moreover, I added the patch on Bazel as mentioned here [Build Instruction for Tensorflow] before using it to build Tensorflow.

    After the successful installation, I tried to do inferencing using a pre-trained model, that's when I encountered the following error. Note that, I'm loading the pretrained model graph from a protobuf file.

    INFO[0000]/root/gopros/src/github.com/rai-project/tensorflow/vendor/github.com/rai-project/dlframework/framework/register.go:76 github.com/rai-project/tensorflow/vendor/github.com/rai-project/dlframework/framework.Register() skipping regitration of hidden model pkg=dlframework/framework name=SSD_MobileNet_v2_Quantized_300x300_COCO INFO[0000] running predict urls model=MobileNet_v1_1.0_224 pkg=dlframework/framework/cmd/server Error: unable to create tensorflow model graph: Invalid value in tensor used for shape: -385679360

    Standalone code to reproduce the issue
    I suspected the error is because the protobuf file for the model graph was originally created on little-endian machine but I'm trying to load it on a big-endian machine. In order to confirm that the error is due to the loading of the model graph from protobuf file in Tensorflow, I tried out [these instructions]. Using them, I created a TensorFlow model and saved it in protobuf format on an x86 system. After that, when I loaded the graph file on an x86 system, it worked but when I tried to load the model graph on s390x system, I got the following error -

    Loading graph Read GraphDef of 27083 bytes ERROR: Dimension 0 in both shapes must be equal, but are 1 and 16777216. Shapes are [1,1] and [16777216,16777216]. for 'dense/kernel/Assign' (op: 'Assign') with input shapes: [1,1], [16777216,16777216].

    I think, this shows that the error is due to endian difference of the architecture in which the protobuf file was created and on which it was loaded.

    Other info / logs

    The following tests are passing on s390x machine -

    //tensorflow/c:c_api_test

    //tensorflow/c:c_api_function_test

    //tensorflow/c:c_test

    //tensorflow/c:ops_test

    //tensorflow/c:env_test

    //tensorflow/c:c_test_util

    //tensorflow/cc/saved_model:reader_test

    //tensorflow/cc/saved_model:loader_test



    ------------------------------
    Priyanshu Khandelwal
    ------------------------------



  • 4.  RE: Endian issue in Tensorflow protobuf graphs

    Posted Wed July 29, 2020 04:50 AM

    Hi Elizabeth,

    I have answered your questions in my comment, please let me know if you need any more information from my end. Also, I request you to connect me with the concerned team at IBM who manages this.

    Best Regards,

    Priyanshu



    ------------------------------
    Priyanshu Khandelwal
    ------------------------------



  • 5.  RE: Endian issue in Tensorflow protobuf graphs

    Posted Thu July 30, 2020 01:21 PM

    Hi Priyanshu,

    Thanks for your patience. I still have some emails out to folks working on this to see who I can connect you with, but some are taking vacations and the version you're using  is quite old, so it may take a little time. In the meantime, I recommend that you reach out to the broader TensorFlow community to see if anyone there has any thoughts.

    I'll let you know as soon as I hear anything on my end!



    ------------------------------
    Elizabeth K. Joseph
    ------------------------------



  • 6.  RE: Endian issue in Tensorflow protobuf graphs

    Posted Wed August 05, 2020 02:04 AM

    Hi Elizabeth,

    I reached out to the TensorFlow community and they confirmed that it is indeed a bug in Tensorflow. Here's the link for the issue - https://github.com/tensorflow/tensorflow/issues/41652. Will it be possible for anyone from IBM to work on it?

    Regards,

    Priyanshu Khandelwal



    ------------------------------
    Priyanshu Khandelwal
    ------------------------------