Python

Python

Python

 View Only

Running Pandas on IBM Open Enterprise SDK for Python

By Steven Pitman posted Fri October 02, 2020 03:06 PM

  

Using Pandas with IBM Open Enterprise SDK for Python

Pandas is a free and open source Python package used for data manipulation and analysis, being the most widely used analysis tool available for Python. It is fast, flexible, easy to use, and it can be run with IBM Open Enterprise SDK for Python, enabling users to take input directly from a multitude of different sources such as formatted files (.csv, .json, etc…), or through RESTful API calls to external services.  

If you have IBM Open Enterprise SDK for Python 3.11, the quickest installation method is to use Python AI Toolkit for IBM z/OS. This is a repository of prebuilt Python packages relating to AI - the benefits of this are that you do not need to have a C or C++ compiler to install these packages, and the installation is significantly quicker. To install Pandas using the Python AI Toolkit for IBM z/OS, you can follow these steps:

  1. Verify that your Python environment is setup correctly
  2. (Optional) Service & Support for the packages from the Python AI Toolkit are optional, see the Python AI Toolkit for IBM z/OS for details on how to acquire this
  3. (Optional) Create a virtual environment
    • python3 –m venv venv
    • source ./venv/bin/activate
  4. Install Pandas using the Python AI Toolkit for IBM z/OS
    • pip3 install pandas --index-url <Python AI Toolkit url> --trusted-host <Python AI Toolkit url>

If you are on IBM Open Enterprise SDK for Python 3.10 or lower, then there is a fork of Pandas v1.1.2 compatible with z/OS. The following instructions can be used to install Pandas:

  1. Follow/verify that your Python environment is setup correctly 
    1. All Python environment variables, PATH, LIBPATH, _BPXK_AUTOCVT, _CEE_RUNOPTS, _TAG_REDIR_ERR, _TAG_REDIR_IN and _TAG_REDIR_OUT have been set 
    2. That the CC and CXX environment variables have been set to the appropriate compiler, being the path to either xlc/xlc++ or xlclang/xlclang++ 
    3. That the Compiler control variables have been set: 
      • export _CC_CCMODE=1 
      • export _CXX_CCMODE=1 
      • export _C89_CCMODE=1 
      • export _CC_EXTRA_ARGS=1 
      • export _CXX_EXTRA_ARGS=1 
      • export _C89_EXTRA_ARGS=1
    4. SSH Keys have been setup to allow cloning from Github
  2. Create a virtualenv to install Pandas into, and activate that venv 
    • python3 -m venv venv --system-site-packages 
    • source ./venv/bin/activate 
  3. Install Pandas into the venv using the source from Github 
    • ./venv/bin/pip3 install git+ssh://git@github.com/pitmanst/pandas.git@v1.1.2.zos 
  4. Fix the permissions following the installation 
    • find venv/lib/python3.8/site-packages/pandas -name '*.so' -exec chmod 755 {} \; 
    • find venv/lib/python3.8/site-packages/pandas -name '*.so' -exec chtag -r {} \; 
    • find venv/lib/python3.8/site-packages/pandas -name '*.x' -exec chtag -r {} \; 

If installing into the default Python directory instead of a virtualenv, the following commands can be used instead:
pip3 install setuptools wheel 'Cython>=0.29.16,<3'
pip3 install --no-build-isolation git+ssh://git@github.com/pitmanst/pandas.git@v1.1.2.zos
Followed by the commands in step 4 modified to point to your Python install directory.

      Running a Sample Pandas program 

      $ python3 
      Python 3.8.5 (heads/pyz_dev-3.8:c68ff1320c, Aug 22 2020, 03:07:56) on zos 
      Type "help", "copyright", "credits" or "license" for more information. 
      >>> import pandas as pd 
      >>> pd.read_csv("./test.csv") 
         col1 col2 col3 
      0    a    b    c 
      1    d    e    f 
      >>> s = pd.Series([1,2,3]) 
      >>> print(s) 
      0    1 
      1    2 
      2    3 
      dtype: int64 

       

      Troubleshooting 

      When trying to read from a CSV, I get an error: 

      UnicodeDecodeError: 'utf-8' codec can't decode byte 0x83 in position 0: invalid start byte 

      To fix this, make sure to tag your file with the correct encoding. See the IBM Open Enterprise Python for z/OS Troubleshooting page for more information. 

       

      Tutorials 

      For getting started with Pandas, the Pandas website has many tutorials which can assist with learning how to use it. 

       

      10 comments
      168 views

      Permalink

      Comments

      Fri February 03, 2023 11:17 AM

      Hi Steven, 
      again thank you for your answer. Yes I can access the repo using my browser. I checked the pip3 install command I used and I found a typo .... Looks like I had been blind this morning ... The clone worked! Sorry for the confusion I have caused!

      regards
      Ronny

      Fri February 03, 2023 10:26 AM

      Hi Ronny, the link cannot be copied directly. You can find the actual GitHub page here - https://github.com/pitmanst/pandas/tree/v1.1.2.zos. Are you able to visit that site?

      Fri February 03, 2023 10:21 AM

      Hi Steven, 

      thank you for your answer, yes i am. I also copued the link into my Browser and got 404 File not found
      regards Ronny

      Fri February 03, 2023 10:10 AM

      Hi Ronny,

      I've confirmed that the repository is still there, and that the instructions still work (tested with IBM Open Enterprise SDK for Python 3.10). Are you able to view other Github projects / behind a company firewall?

      Fri February 03, 2023 02:23 AM

      Hi,

      looks like this blog post is invalid. The git repo is not here anymore, you get a 404 from git clone. I suggest to either update this or remove it

      regards

      Tue November 15, 2022 09:01 AM

      Hi Nagaraj,

      It's not trying to build the wheel package itself (it's correctly installed) - it's trying to build a wheel for the Pandas installation - a wheel is a zip file with all the built files and Python scripts from a given package inside it.

      The error you're seeing is usually due to xlc not being installed or not being installed correctly. Could you try creating a test c file and confirm that you can install with xlc? Here's a sample dummy c file:

      int main()
      {
          return 0;
      }​

      Which you can then compile by using:

      /bin/xlc <filename.c>

      If that errors out with the same error, you'll need to contact your sysadmin for help on getting xlc installed or fixed.

      Tue November 15, 2022 06:22 AM

      Hi Steven
      I followed exactly what you mentioned below in your post.  Now it is trying to build wheel, even though wheel has been installed. Here is the output of what I see on the screen. Why is it trying to build wheel when it is already available?
      $ /usr/lpp/ported/bin/bash
      bash-4.3$ export CC=/bin/xlc
      bash-4.3$ export CXX=/bin/xlc++
      bash-4.3$ export _CC_CCMODE=1
      bash-4.3$ export _CXX_CCMODE=1
      bash-4.3$ export _C89_CCMODE=1
      bash-4.3$ export _CC_EXTRA_ARGS=1
      bash-4.3$ export _CXX_EXTRA_ARGS=1
      bash-4.3$ export _C89_EXTRA_ARGS=1
      bash-4.3$ cd pandas-1.1.2
      bash-4.3$ pip3 install wheel 'Cython>=0.29.16,<3'
      Requirement already satisfied: wheel in /u/a351484/python/usr/lpp/IBM/cyp/v3r10/pyz/lib/python3.10/site-packages (0.38.4)
      Requirement already satisfied: Cython<3,>=0.29.16 in /u/a351484/python/usr/lpp/IBM/cyp/v3r10/pyz/lib/python3.10/site-packages (0.29.32)
      WARNING: There was an error checking the latest version of pip.
      bash-4.3$ pip3 install --no-build-isolation ./pandas-1.1.2.zos
      Processing ./pandas-1.1.2.zos
        Preparing metadata (pyproject.toml) ... done
      Requirement already satisfied: python-dateutil>=2.7.3 in /u/a351484/python/usr/lpp/IBM/cyp/v3r10/pyz/lib/python3.10/site-packages (from pandas==1.1.2) (2.8.2)
      Requirement already satisfied: pytz>=2017.2 in /u/a351484/python/usr/lpp/IBM/cyp/v3r10/pyz/lib/python3.10/site-packages (from pandas==1.1.2) (2022.6)
      Requirement already satisfied: six>=1.5 in /u/a351484/python/usr/lpp/IBM/cyp/v3r10/pyz/lib/python3.10/site-packages (from python-dateutil>=2.7.3->pandas==1.1.2) (1.16.0)
      Building wheels for collected packages: pandas
        Building wheel for pandas (pyproject.toml) ... error
        error: subprocess-exited-with-error
      
        × Building wheel for pandas (pyproject.toml) did not run successfully.
        │ exit code: 1
        ╰─> [10 lines of output]
            running bdist_wheel
            running build
            running build_py
            UPDATING build/lib.os390-28.00-8561-3.10/pandas/_version.py
            set build/lib.os390-28.00-8561-3.10/pandas/_version.py to '1.1.2'
            running build_ext
            building 'pandas._libs.algos' extension
            /bin/xlc -DNDEBUG -O3 -qarch=10 -qlanglvl=extc99 -q64 -Wc,DLL -D_XOPEN_SOURCE_EXTENDED -D_UNIX03_THREADS -D_POSIX_THREADS -D_OPEN_SYS_FILE_EXT -qexportall -qascii -qstrict -qnocsect -Wa,asa,goff -Wa,xplink -qgonumber -qenum=int -DNPY_NO_DEPRECATED_API=0 -D__s390__=1 -I./pandas/_libs -Ipandas/_libs/src/klib -I/u/a351484/python/usr/lpp/IBM/cyp/v3r10/pyz/lib/python3.10/site-packages/numpy/core/include -I/u/a351484/python/usr/lpp/IBM/cyp/v3r10/pyz/include/python3.10 -c pandas/_libs/algos.c -o build/temp.os390-28.00-8561-3.10/pandas/_libs/algos.o
            FSUM3224 xlc: Fatal error in /usr/lpp/cbclib/xlc/exe/ccndrvr: signal 9 received.
            error: command '/bin/xlc' failed with exit code 251
            [end of output]
      
        note: This error originates from a subprocess, and is likely not a problem with pip.
        ERROR: Failed building wheel for pandas
      Failed to build pandas
      ERROR: Could not build wheels for pandas, which is required to install pyproject.toml-based projects
      WARNING: There was an error checking the latest version of pip.​

      Fri November 11, 2022 03:17 PM

      Hi Nagaraj,

      When installing a package using pip, it'll attempt to install all packages/dependencies in an isolated environment. To get around this, you'll want to use the --no-build-isolation flag to pip if you already have all the dependencies e.g:
      pip3 install --no-build-isolation ./pandas-1.1.2.zos​

      For your case, I'm not sure how you obtained the source, but if it was obtained from Github, I know that there's a small change that's required to be made due to the branch name - this isn't an issue if installing it directly from ssh. Here's some full instructions on installing Pandas when downloading the zip file from Github:

      # Environment variables for XL C/C++ Compiler
      export CC=/bin/xlc
      export CXX=/bin/xlc++
      
      export _CC_CCMODE=1
      export _CXX_CCMODE=1
      export _C89_CCMODE=1
      export _CC_EXTRA_ARGS=1
      export _CXX_EXTRA_ARGS=1
      export _C89_EXTRA_ARGS=1
      
      # Make our venv
      python3 -m venv venv --system-site-packages
      
      # Unzip pandas and tag the files within it
      unzip pandas-1.1.2.zos.zip
      chtag -Rtc ISO8859-1 ./pandas-1.1.2.zos
      
      # Versioneer includes extra information when using the zip file directly from github
      # which is not included when normally cloned with git. Remove it so it's PEP 440 compatible
      cd ./pandas-1.1.2.zos/pandas
      /bin/sed 's/v1.1.2.zos/v1.1.2/g' _version.py > _version.py.sed
      mv _version.py.sed _version.py
      cd ../..
      
      # Install required dependencies + pandas
      pip3 install wheel 'Cython>=0.29.16,<3'
      pip3 install --no-build-isolation ./pandas-1.1.2.zos​

      Thanks,

      Steven

      Thu November 10, 2022 05:00 AM

      Hi Steven
      I managed to download Cython from PyPI and installed it using pip. After this I issued a "pip3 list"  to list out the packages installed in the environment and I get the following 
      Package Version
      ------------ -------
      cffi 1.14.6
      cryptography 3.3.2
      Cython 0.29.32
      ebcdic 1.1.1
      numpy 1.21.2
      pip 22.1.2
      pycparser 2.20
      setuptools 58.3.0
      six 1.16.0
      wheel 0.38.4
      zoautil-py 1.2.1
      zos-util 1.0.0

      This clearly shows that I have setuptools, Cython and wheel. 
      After this I issue "./venv/bin/pip3 install ~/pandas-1.1.2/pandas-1.1.2.zos", to install the downloaded pandas source code which is in the path indicated. But I get the error as shown below:
      Processing ./pandas-1.1.2/pandas-1.1.2.zos
      Installing build dependencies ... error
      error: subprocess-exited-with-error

      × pip subprocess to install build dependencies did not run successfully.
      │ exit code: 1
      ╰─> [8 lines of output]
      WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x500B9633A0>: Failed to establish a new connection: [Errno 1130] EDC8130I Host cannot be reached.')': /simple/setuptools/
      WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x500B963190>: Failed to establish a new connection: [Errno 1130] EDC8130I Host cannot be reached.')': /simple/setuptools/
      WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x500B962F20>: Failed to establish a new connection: [Errno 1130] EDC8130I Host cannot be reached.')': /simple/setuptools/
      WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x500B78A9B0>: Failed to establish a new connection: [Errno 1130] EDC8130I Host cannot be reached.')': /simple/setuptools/
      WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x500B78A620>: Failed to establish a new connection: [Errno 1130] EDC8130I Host cannot be reached.')': /simple/setuptools/
      ERROR: Could not find a version that satisfies the requirement setuptools<=58.3.0 (from versions: none)
      ERROR: No matching distribution found for setuptools<=58.3.0
      WARNING: There was an error checking the latest version of pip.
      [end of output]
      I have setuptools 58.3.0 installed as you can see from the listed packages above (the output of pip3 list), so why is it saying that the requirement is not met? 

      Thank you
      Nagaraj

      Wed November 09, 2022 11:15 PM

      Hi Steven
      The zOS LPAR I am trying to install pandas on, cannot connect to the internet/Github. I downloaded the repository from Github (I ensured that it is the zos specific repo that I am pointing to) as a .zip file, uploaded it to USS on the zOS LPAR and unzipped it. The README.md file in the repo says that I need Cython to install pandas from source.  If I were connected to the internet (from the zOS LPAR on which I intend to install pandas) and issue the pip install command (as given by you on the webpage), does it pull Cython in the background and install it?