The purpose of this blog is to show you how to change the batch size to improve the inferencing throughput of an LSTM AI model. We know that using a larger batch size, such as 640, provides higher inferencing throughput than using smaller batch sizes. In this blog, we’ll explore how to rebuild and deploy the model with the preferred batch size.
Before you begin, you should have the following:
Estimated time to complete
Assuming you already have the prerequisites, the batch size customization time is the sum of the time needed for model code update and build then making updates to the UDF that uses the updated model.
Follow these steps to create and deploy your model with a new batch size:
Outside of Db2:
- Activate a conda environment of your choice, and make sure you have access to the Python code to build the LSTM DL model and the corresponding data set in this environment. An example of a file that creates an AI model can be found here: https://github.ibm.com/toropsp/US-Bank-Workload-MMA/blob/master/ccf_220_keras_lstm_static-OS.py
- Edit the Python file for building the model, update the batch size and save.
- Compile and retrain your model with the new batch size and save it in either Keras H5 or ONNX formats.
- After the model finishes building, transfer the model file and the mapper file over to Db2 and place it in your UDF directory. Note that if the model’s name or mapper file’s name have been updated, you will need to update these in your UDF in Db2 in step 5 below.
- Change the batch_size parameter in your UDF code, so that it matches the batch size you just used to rebuild your model outside of Db2. Make sure that the sequence length in the UDF code also matches the sequence length that was used for the LSTM model.
- To use the newly built LSTM model for batched scoring in Db2, you can execute a SQL query referencing the UDF. For example, we used the following query to pull in 44800 rows at one time to predict banking frauds in a credit card transaction table:
time db2 "select PREDICTION from bank.indexed_trans i, table(cachesys predict_udtf(44800, i.Index, i.User_id, i.Card, i.Year, i.Month, i.Day, i.Time, i.Amount, i.Use_Chip, i.Merchant_Name, i.Merchant_City, ifnull(i.Merchant_State, 'CA'), ifnull(i.Zip, '0'), i.MCC, ifnull(i.is_Errors, 'missing_value'), i.is_Fraud)) where i.index<=X”
- The value highlighted in blue is the number of rows to be processed by the UDF. This is equal to the product of the sequence length, batch size, and number of batches.
7 * 640 * 10 = 44800
- In our example, the 44800 rows consist of 1 of the most recent credit card transactions from 6400 customers and credit card combinations. For each of these transactions, the data for the past 6 transactions for the same customer on the credit card are also selected. That is what the sequence number being 7 for the LSTM model means in our scenario.
1 + 6 = 7
- We used the “where” clause to control the number of rows returned with an index value X.
We have measured the performance difference of using different batch sizes to process the same amount of data and found that larger batch sizes can help reduce the overall processing time. For example, depending on the number of concurrent users issuing the queries, batch size 640 can be 2 to 6 times faster compared to batch size 16, as shown in the figure below.
Increasing the batch size has proven to be a powerful method for improving the inferencing throughput of an LSTM AI model. In this blog, we have explored how to improve the batched inferencing throughput of an AI model by building the model with a certain size then using it in an UDF with the match data pattern, finally testing it using a SQL query to do batched fraud analytics.
I would like to thank Vishrutha Tupili for coauthoring this blog. If you have any questions feel free to add a comment below or reach out to me at firstname.lastname@example.org.