Installing & Running Pandas with IBM Open Enterprise SDK for Python
Pandas is a free and open source Python package used for data manipulation and analysis, being the most widely used analysis tool available for Python. It is fast, flexible, easy to use, and it can be run with IBM Open Enterprise SDK for Python, enabling users to take input directly from a multitude of different sources such as formatted files (.csv, .json, etc…), or through RESTful API calls to external services.
If you have IBM Open Enterprise SDK for Python 3.11, the quickest installation method is to use Python AI Toolkit for IBM z/OS. This is a repository of prebuilt Python packages relating to AI - the benefits of this are that you do not need to have a C or C++ compiler to install these packages, and the installation is significantly quicker. To install Pandas using the Python AI Toolkit for IBM z/OS, you can follow these steps:
- Verify that your Python environment is setup correctly
- (Optional) Service & Support for the packages from the Python AI Toolkit are optional, see the Python AI Toolkit for IBM z/OS for details on how to acquire this
- (Optional) Create a virtual environment
- python3 –m venv venv
- source ./venv/bin/activate
- Install Pandas using the Python AI Toolkit for IBM z/OS
- pip3 install pandas --index-url <Python AI Toolkit url> --trusted-host <Python AI Toolkit url>
If you are on IBM Open Enterprise SDK for Python 3.10 or lower, then there is a fork of Pandas v1.1.2 compatible with z/OS. The following instructions can be used to install Pandas:
- Follow/verify that your Python environment is setup correctly
- All Python environment variables, PATH, LIBPATH, _BPXK_AUTOCVT, _CEE_RUNOPTS, _TAG_REDIR_ERR, _TAG_REDIR_IN and _TAG_REDIR_OUT have been set
- That the CC and CXX environment variables have been set to the appropriate compiler, being the path to IBM C/C++ For Open Enterprise Languages on z/OS 2.0, IBM Open XL C/C++ 1.1 for z/OS, IBM XL C/C++ V2.4.1 for z/OS 2.4, or IBM z/OS XL C/C++
- That the Compiler control variables have been set:
- export _CC_CCMODE=1
- export _CXX_CCMODE=1
- export _C89_CCMODE=1
- export _CC_EXTRA_ARGS=1
- export _CXX_EXTRA_ARGS=1
- export _C89_EXTRA_ARGS=1
- SSH Keys have been setup to allow cloning from Github
- Create a virtualenv to install Pandas into, and activate that venv
- python3 -m venv venv --system-site-packages
- source ./venv/bin/activate
- Install dependencies and Pandas into the venv using the source from Github
- ./venv/bin/pip3 install setuptools wheel 'Cython>=0.29.16,<3'
- ./venv/bin/pip3 install git+ssh://git@github.com/pitmanst/pandas.git@v1.1.2.zos
Running a Sample Pandas program
$ python3
Python 3.11.5 (heads/pyz_dev-3.11.ziip:307931de97, Oct 19 2023, 09:40:03) [Clang 14.0.0 ] on zos
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.read_csv('test.csv')
col1 col2 col3
0 a b c
1 d e f
>>> pd.Series([1,2,3])
0 1
1 2
2 3
dtype: int64
Troubleshooting
When trying to read from a CSV, I get an error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x83 in position 0: invalid start byte
To fix this, make sure to tag your file with the correct encoding. See the IBM Open Enterprise SDK for Python Troubleshooting page for more information.
Tutorials
For getting started with Pandas, the Pandas website has many tutorials which can assist with learning how to use it.