Data and AI on Power

 View Only

How to create and test MMA accelerated UDF on IBM Db2

By Theresa Xu posted Wed September 07, 2022 02:29 PM

  

The purpose of this blog is to show you how to enable AI infused SQL queries in IBM Db2 using Python UDFs (User-Defined Functions). The blog also explains how to build AI models on an IBM Power10 system running Linux as well as how to create the dataset table in Db2.

Inferencing using AI infused SQL queries has several advantages over other alternatives because the AI scoring function is executed within Db2 without moving data to a different host.   Therefore, it has less overhead, is faster and more secure.

Prerequisites

To complete the steps outlined in this blog you’ll need the following:

  • Access a Power10 system running Red Hat Enterprise Linux (RHEL) 8.6, or higher.
  • IBM Db2 version 11.5.7.0, or higher.
  • UDFs created under Db2 which employ MMA accelerated Python libraries, such as Tensorflow, Pytorch, ONNX, and so on.
  • A conda environment with the necessary AI libraries installed for the "fenced" user ID used by Db2. For this test we have set up a conda environment called mma-env under "/home/db2fenc1/anaconda3/envs/mma-env
  • Disable ASLR to avoid a db2fmp abend problem. See the recommendation from Db2.

Estimated time to complete

It’ll take approximately 1 hour to create and test the MMA accelerated UDF on Db2.

Steps

To create the MMA accelerated UDF in Db2, follow the steps outlined below. 

  1. Create a model either under the conda environment on the Db2 host under the fenced user, or in other systems. Db2 provides the infrastructure to run UDFs under a fenced ID which is different from the database instance owner’s ID. This allows proper access isolation between application process versus the database engine process. See this blog to learn how to build an AI model to take advantage of MMA on IBM Power.
  2. Copy the AI model and its corresponding mapper files to the routines directory and change the owner of these files to the fenced user ID.
    cp model.h5 /home/db2inst1/sqllib/function/routine
    cp fitted_mapper.pkl /home/db2inst1/sqllib/function/routine
    chown -R db2fenc1:db2fsdm1 /home/db2inst1/sqllib/function/routine/*​

    Where model.h5 is the name of your model, fitted_mapper.pkl is the mapper file, and /home/db2inst1/sqllib/functions/routine is the directory for the  UDF code in Db2.

  3. Update the database manager’s configuration regarding which Python environment to use. Note that this must match the conda environment you have already setup as part of the prerequisites. Then,
    • Switch to the db2inst1 User ID.
      su - db2inst1​​​​
    • Set the Python Path for UDFs under Db2.
      db2 update dbm cfg using PYTHON_PATH "/home/db2fenc1/anaconda3/envs/mma-env/bin/python3.9"​​​
  4. Update sqllib/adm/runpy.sh with Python environment lib path, save the file and stop and start Db2.
    • Add the export statement similar to the following line to sqllib/adm/runpy.sh:
      export LD_LIBRARY_PATH=/home/db2fenc1/anaconda3/envs/mma-env/lib/libstdc++.so.6:$LD_LIBRARY_PATH
      Without this you may encounter the following error when running the Python UDF:
      SQL0443N Routine "*LY_TABLE" (specific name "") has returned an error
      SQLSTATE with diagnostic text "<class 'ImportError: /lib64/libstdc++.so.6:
      version `GLIBCX". SQLSTATE=38N93
  5. Create and load the data into one or more tables which will be accessed by the SQL queries that invoke the UDFs you will create in the next step
  6. Provide the python code for UDFs in one or more files under the /home/db2inst1/sqllib/function/routine directory then create the UDFs pointing to these file under Db2
    • Follow the IBM Db2 documentation for more details: https://www.ibm.com/docs/en/db2-warehouse?topic=python-creating-udx-source-files-in
    • There are 2 types of UDFs:
      • User Defined Scalar Function (UDSF) which works with one row at a time, thus the AI model can have the data from one row as input. This may not work for some AI models that are built with more than one sequence number or more than one batch number.
      • User Defined Table Function (UDTF) which allows many rows to be processed all at once.   This is significantly faster for large, batched data analytics. The key here is to match the batch size of the AI model with the batch size of the UDTF.
    • To create a simple UDF that doesn’t use data in database tables:
      • Add the following code for a simple multiply function in this file /home/db2inst1/sqllib/function/routine/multiply.py.
        import nzae
        class multiply(nzae.Ae):
            def _getFunctionResult(self, rows):
                x, y = rows
                if x is None:
                    self.userError("first input column may not be null")
                return x * y
        multiply.run()
        
      • Create the UDTF under Db2.
        db2 "CREATE FUNCTION multiply_table(integer, integer) \
        returns table (m integer) language PYTHON  \
        parameter style NPSGENERIC  FENCED  NOT THREADSAFE  \
        NO FINAL CALL  DISALLOW \ PARALLEL  NO DBINFO \
        DETERMINISTIC NO EXTERNAL ACTION \
        RETURNS NULL ON NULL INPUT  NO SQL \
        external name '/home/db2inst1/sqllib/function/routine/multiply.py'"
        ​

        Where multiply.py is the name of the UDF Python file. Notice the UDF's name is multiply_table and it takes two integer inputs and results in a one integer output. The number of inputs will probably be larger for your AI workload.

  7. Create the SQL query that employs the UDF during execution in Db2.  In this example the SQL query pushes two values, which can be data from columns of a table, into a Python UDF called multiply_table.  The Python code returns a value to the Db2 engine, which in turn returns as a standard result to its caller: 

    db2 "select * from table(multiply_table(3,5))"
    M        
    -----------
             15​​

Summary

In this blog we’ve shown you how to use an existing AI model to perform inferencing through SQL queries in Db2.   This can be a really powerful tool to enrich database capabilities with Python code that a data scientist or application developer can create.

There are some useful tricks that are captured under our GitHub repo. Check them out at: https://github.com/ppc64le/gists-ai-inferencing/tree/main/lstm-credit-card-fraud to help you understand how to work with data in database tables within your Python UDF. 

Permalink

Comments

Tue February 13, 2024 02:59 PM

For public access to the same example, please visit https://github.com/ppc64le/gists-ai-inferencing/tree/main/lstm-credit-card-fraud.