Data Management Global

 View Only

Best practices for performance optimization in Db2 ODBC and CLI driver applications

By Amarnath Reddy posted Tue August 13, 2024 04:17 AM

  

This article describes few of the best practices that can be used by applications to leverage the functionalities of the driver in-order to gain performance benefits.
I will talk about how data insertion and data retrieval can have additional performance boost by following features of the driver based on the application requirements, in addition to the common use cases that are applicable to both the scenarios.

Data Insertion:

Array input:
If your application requires to handle different values for a SQL statement, you can use array bind-in to supply such values in a single execution instead of executing the same statement or query multiple times.

Benefits:

    • Performance: Reduces network round-trips and server-side parsing overhead by sending multiple rows of data in a single operation.
    • Simplicity: Simplifies application logic by reducing the number of SQL statements needed for batch operations.
    • Scalability: Improves scalability by minimizing the impact of network latency and server processing time for each individual row.
Example:
#define ARRAY_SIZE 3
#define BUFFER_SIZE 50
   // Prepare arrays for parameter values
    SQLINTEGER param1Array[ARRAY_SIZE] = {1, 2, 3};
    SQLCHAR param2Array[ARRAY_SIZE][BUFFER_SIZE] = {"value1", "value2", "value3"};
    SQLDOUBLE param3Array[ARRAY_SIZE] = {10.5, 20.3, 15.7};

// Bind parameters
    SQLBindParameter(hstmt, 1, SQL_PARAM_INPUT, SQL_C_SLONG, SQL_INTEGER, 0, 0, param1Array, 0, NULL);
    SQLBindParameter(hstmt, 2, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR, BUFFER_SIZE, 0, param2Array, 0, NULL);
    SQLBindParameter(hstmt, 3, SQL_PARAM_INPUT, SQL_C_DOUBLE, SQL_DOUBLE, 0, 0, param3Array, 0, NULL);

// Execute the batch insert statement
    ret = SQLExecDirect(hstmt, (SQLCHAR *) "INSERT INTO my_table (column1, column2, column3) VALUES (?, ?, ?)", SQL_NTS);
  

Array Input Chaining and MRI(Multi-row Insert):

In this feature, the driver chains together SQLExecute() requests for a single prepared statement before sending the requests to the server. All SQLExecute() requests associated with a prepared statement are not sent to the server until either the SQL_ATTR_CHAINING_END statement attribute is set, or the available buffer space is consumed by rows that have been chained. The size of this buffer is defined by the aslheapsz database manager configuration parameter for local client applications, or the rqrioblk database manager configuration parameter for client/server configurations. 

This attribute can be used with the CLI/ODBC configuration keyword ArrayInputChain to effect array input without needing to specify the array size. See the documentation for ArrayInputChain for more information.

Benefits of Array Input chaining:

·       Performance: Reduces the number of networks to server and performs bulk-executes in a single request there by improving the application performance

·       Transaction control: Ensures transactional integrity by grouping multiple inserts into a single transaction. If any part of the batch fails, the entire transaction can be rolled back, maintaining consistency in the database.

Example:
SQLPrepare (statement1)

SQLExecute (statement1)

SQLExecute (statement1)

/* the two execution requests for statement1 in above example are sent to the server in

two network flows */

SQLPrepare (statement2)

/* enable chaining */

SQLSetStmtAttr (statement2, SQL_ATTR_CHAINING_BEGIN)

SQLExecute (statement2)

SQLExecute (statement2)

SQLExecute (statement2)

/* end chaining */

SQLSetStmtAttr (statement2, SQL_ATTR_CHAINING_END)

/* the three execution requests for statement2 in above example are sent to the server in a single network flow,

instead of three separate flows */


Notes:

For Db2 on LUW, CLI driver converts the chained executes to array insert and then send to server for performance gain.
for Db2 on z/OS, CLI driver sends the chained execute requests. In-order to utilize the additional performance benefit due to multi-row insert (MRI) feature in Db2 for z/OS, you can use ColumnwiseMRI keyword.

For more information, please refer following links:
https://www.ibm.com/docs/en/db2/11.5?topic=attributes-statement-list
https://www.ibm.com/docs/en/db2/11.5?topic=tips-reduction-network-flows
https://www.ibm.com/docs/en/db2/11.5?topic=attributes-connection-list#r0006816__columnwise_mri
https://www.ibm.com/docs/en/db2/11.5?topic=keywords-columnwisemri

 

CLI Load:

CLI Load allows you to insert data in CLI using LOAD instead of array insert and can yield significant performance improvements when performing bulk insert.
Please refer below links for more details:
https://www.ibm.com/docs/en/db2/11.5?topic=data-importing-cli-load
https://www.ibm.com/docs/en/db2/11.5?topic=cli-tbloadc-insert-data-using-load-utility

Data Retrieval:

Qryblksize/Fet_buf_size/ FETCHBUFFSIZE


It is the value of the data block size that can be used to optimize the application performance by controlling the number of trips to the data source required to retrieve data.

For database applications that read large quantities of data sequentially, a larger fetch buffer size improves performance. The CLI driver FET_BUF_SIZE and FetchBufferSize keywords are used to configure the buffer size used by fetch requests in multiples of 32K. The previous upper limit of 256 KB can create a bottleneck when fetching large amount of data, as it involves a large number of TCP send and receive requests. With 11.5.7.0 release, the FET_BUF_SIZE is increased to around 10MB, which drastically reduces the number of TCP send and receive requests. This increase also improves performance on slower or congested networks, as well as networks where the physical distance is great between client and server.

Adjusting FETCHBUFFSIZE appropriately can help optimize performance by balancing network round-trips, memory usage, and throughput.
Larger values of FETCHBUFFSIZE require more memory on the client side to hold the fetched data temporarily. Therefore, you should consider the available memory on the client machine when adjusting this parameter.

Benefits:

·    Performance: Reduced network trip to database server when fetching large volumes of data and thereby reducing the impact due to network latency

For more information about this property, please refer:
https://www.ibm.com/docs/en/db2/11.5?topic=keywords-fetchbuffersize

BlockForNRows or NumRowsonFetch

If you want to retrieve more than one row in a single fetch request to database server, you can use BlockForNRows or NumRowsonFetch keyword to specify the rowset size.
This approach can significantly improve performance when dealing with large result sets.


Benefits:

  • Reduced Network Traffic: By fetching multiple rows (a block of rows) at once from the database server to the client application, BlockForNRows reduces the number of round-trips required over the network. This reduction in network traffic can lead to substantial performance improvements, especially when dealing with high-latency connections or large datasets.

You can refer below links for more details:
https://www.ibm.com/docs/en/db2/11.5?topic=keywords-numrowsonfetch
https://www.ibm.com/docs/en/db2/11.5?topic=keywords-blockfornrows

MaxBlkExt or SQL_ATTR_MAXBLKEXT

The Db2 database servers like Db2 on z/OS can return multiple query blocks of data in response to an OPEN or Fetch request. These extra query blocks are returned in addition to the one that is always sent back by database servers for fetch request. You can configure the number of such extra query blocks that the server must return via MaxBlkExt configuration keyword or via statement and connection attribute SQL_ATTR_MAXBLKEXT.

If MaxBlkExt is set to N, the server will return N+1 block of query data. So, MaxBlkExt is to specify the extra number of blocks on top of regular one block of query data sent by the server.

Benefits:

  • Reduced network traffic: This new feature allows the client/application to minimize the number of network line turnarounds by requesting for additional query blocks of data in response to fetch request to database server instead of one query block.
     

For more information on this feature, please refer below links:
https://www.ibm.com/docs/en/db2/11.5?topic=tuning-increasing-db2-connect-data-transfer-rates
https://www.ibm.com/docs/en/db2/11.5?topic=attributes-statement-list
https://www.ibm.com/docs/en/db2/11.5?topic=attributes-connection-list
https://www.ibm.com/docs/en/db2/11.5?topic=keywords-maxblkext
https://www.ibm.com/docs/en/db2/11.5?topic=tuning-extra-query-block


Array Retrieval:


If the application requires that each column or each row of a result set be stored in an array, each fetch must be followed by either a data copy operation or a new set of SQLBindCol() calls to assign new storage areas for the next fetch. Alternatively, applications can eliminate the resource usage of extra data copies or extra SQLBindCol() calls by retrieving multiple rows of data (called a rowset) at one time into an array.You can specify the size of rowset ie number of rows to retrieve in a single fetch request via SQL_ATTR_ROW_ARRAY_SIZE attribute.

There are 2 ways in which the arrays can be bound in application for data retrieval viz:
i. Column-wise binding - is the default binding method whereby each data value and its length is stored in an array.
ii Row-wise binding - you can retrieve multiple rows of data using row-wise binding. Row-wise binding associates an entire row of the result set with a structure. The rowset is retrieved into an array of structures, each of which holds the data in one row and the associated length fields.

Benefits:

·       Performance: Reduces the number of network trips to database server for fetching the data, especially useful when you want to fetch large result sets
·       Efficient memory usage
·       Reduces the code duplication in application – you can use a single array of specified size for columns and use single SQLBindCol()

For more information, please refer:
https://www.ibm.com/docs/en/db2/11.5?topic=data-scrollable-cursors
https://www.ibm.com/docs/en/db2/11.5?topic=data-array-retrieval
https://www.ibm.com/docs/en/db2/11.5?topic=attributes-statement-list

QueryPrefetch/SQL_ATTR_QUERY_PREFETCH

In Db2 11.5.5 and older versions, the driver would perform blocking fetch request to the database server i.e the driver would send a fetch request to server, consume the response or resultset in the form of data block and retrieve the next data block and so on.  In Db2 V11.5.6 and later, to improve the performance, driver before returning to the bind-out process sends the prefetch request to fetch next data block and returns to process the data from current request. This would make the server produce the rows that can be consumed by application continuously without waiting or causing additional response time from server.

Benefits:

·       Performance: Instead of waiting for reply from server, driver can now utilize that time to process data. This is expected to enhance the overall performance of the driver while fetching large data sets and can reduce the time consumed by the network when waiting for server response. i.e we are basically eliminating the server response time by utilizing this time to process the data set.

You can refer below links for more information:
https://www.ibm.com/docs/en/db2/11.5?topic=keywords-queryprefetch

https://www.ibm.com/docs/en/db2/11.5?topic=dscde-cli-driver-enhancements-2


Others:

Deferredprepare: On by default

Db2 ODBC and CLI driver allows to send the PREPARE request for the SQL statement either separately or chained together with the EXECUTE request.
In the latter case, the prepare request is deferred i.e even if your application is executing the SQLPrepare(), the driver defers flowing the request to server and chains it with the corresponding execute request. This helps to improve the performance as the statement prepare and execution request can now flow in a single network trip to database server.

Benefits:
  • Performance: Reduced network trip to database server. This is of greatest benefit when an application generates queries with very small answer sets, because the resource utilization for requests and replies flowing over the network represents a large percentage of the processing time.

For more information on this keyword, please refer:
https://www.ibm.com/docs/en/db2/11.5?topic=keywords-allowdeferredprepare
https://www.ibm.com/docs/en/db2/11.5?topic=statements-prepare
https://www.ibm.com/docs/en/db2/11.5?topic=odbc-programming-hints-tips#c0000668__hdglobsc

Compound SQL(CLI):

Using Compound SQL (CLI), You can group multiple SQL statements in a single batch of executable block that can be sent to server in one network trip together with necessary input parameter values for execution.

Benefits:

  • Reduced network trips: All the statements inside the compound block and sent in a single request to the database server.

For more information, please refer: https://www.ibm.com/docs/en/db2/11.5?topic=sql-compound-cli

Binding packages with KEEPDYNAMIC:

If you have dynamic SQLs, KEEPDYNAMIC allows you to re-execute the same SQL after commit/rollback without doing re-prepare. With this feature, the database server preserves such statements in a dynamic cache and do not discard them after the execution completes so that they can be reused. Dynamic SQL statements prepared in a package bound with the KEEPDYNAMIC YES option are kept in the SQL context after a COMMIT statement. In Other words, the parsing and optimizing over head is reduced for such statements when KEEPDYNAMIC is enabled and when the same SQL statements are executed again.
To use this feature, you can bind the db2 packages with the KEEPDYNAMIC YES option.

Benefits:
  • Performance: Improved execution times as statements can be prepared once and executed multiple times outside transaction boundaries. 
  • Efficient Resource Utilization: Keeping dynamic SQL statements can lead to better use of database resources, as the overhead associated with parsing and optimizing SQL statements is minimized.

For more information, please refer below links:
https://www.ibm.com/docs/en/db2/11.5?topic=odbc-programming-hints-tips#c0000668__hnkpdyn
https://www.ibm.com/docs/en/db2/11.5?topic=keywords-enablekeepdynamic
https://www.ibm.com/docs/en/db2/11.5?topic=commands-bind
https://www.ibm.com/docs/en/db2/11.5?topic=commands-db2cli-db2-interactive-cli
https://www.ibm.com/docs/en/db2/11.5?topic=environment-bind-files-package-names
https://www.ibm.com/docs/en/db2/11.5?topic=statements-commit

Data Conversions:


Db2 ODBC and CLI Driver processes the data based on the C types and the SQL types used and the corresponding conversion to use to generate the data in the expected format by application and the database server for respective handling.
Data type conversion can occur under one of two conditions:

  • The application specified a C type that is not the default C type for the SQL type.
  • The application specified an SQL type that does not match the base column SQL type at the server, and there was no describe information available to the CLI driver.

If your application is designed in such a way that does not handle the data as per the defaults, then, the driver will perform additional data conversions internally to convert the data to and from the database server in expected format. If however, you have use-cases where no conversion is required e.g: using a SQL_C_CHAR and SQL_CHAR for a C type and the SQL type respectively for a database column defined as “CHAR” in the database table, no additional conversion is required. And, if you use a different C or SQL Type, driver performs additional data conversions based on the describe information provided by the application and the database server.
Hence, you will see additional performance overhead when dealing with such scenarios where the data conversion might be required before insertion and retrieval to and from the database.
In order to avoid the additional processing time due to conversions, you can refer the driver guidelines for the matching data types and supported conversions.

You can refer below link for more details:

https://www.ibm.com/docs/en/db2/11.5?topic=attributes-supported-data-conversions

Miscellaneous:


1. Upgrade to 11.5.9.0 if you are using a db2dsdriver.cfg with hundreds of DSN entries. There has been a design change in processing the db2dsdriver.cfg file for resolving DSNs that will help improve the performance for database connection requests:
https://www.ibm.com/docs/en/db2/11.5?topic=enhancements-cli-driver

2. QueryOptimizationLevel:
https://www.ibm.com/docs/en/db2/11.5?topic=keywords-queryoptimizationlevel
https://www.ibm.com/docs/en/db2/11.5?topic=statements-set-current-query-optimization

3. Use SQL_ATTR_MAX_ROWS or SQL_ATTR_OPTIMIZE_FOR_NROWS/ OPTIMIZE FOR n ROWS or SQL_ATTR_BLOCK_FOR_NROWS/ FETCH FIRST N ROWS ONLY to fetch more than one row of data and as per the requirement. For more information, refer :
https://www.ibm.com/docs/en/db2/11.5?topic=odbc-programming-hints-tips
https://www.ibm.com/docs/en/db2/11.5?topic=wss-using-optimize-n-rows-clause-fetch-first-n-rows-only-clause
https://www.ibm.com/docs/en/db2/11.5?topic=keywords-optimizefornrows
https://www.ibm.com/docs/en/db2/11.5?topic=attributes-statement-list


#DataManagementGlobal
#Featured-area-2
0 comments
25 views

Permalink