Efficient, scalable caching using ESQL shared variables in WebSphere Message Broker and IBM Integration Bus

 View Only
Wed June 30, 2021 02:57 PM

ESQL is an extension of SQL defined by WebSphere® Message Broker and IBM® Integration Bus to define and manipulate data within a message flow. ESQL shared variables are often used to cache the contents of database tables, with the shared variable cache implemented as an array containing the result of a SELECT operation, and searches conducted by issuing a SELECT against that array. The problem with this approach is that ESQL SELECT operations perform a sequential search, so as the cache grows, searches become slower. Most users have found that when cache sizes exceed a few thousand rows, it is faster to access the database than to use a cache. This article shows you a new way to structure the cache that scales well, and remains faster than accessing the database even when exceeding 10,000 rows.

Consider a message flow that receives messages containing an airport code (such as LHR) and accesses a database table to obtain the corresponding city name (London). Accessing the database for each message can be expensive, so a common solution is to load the table into an ESQL ROW shared variable on first access, and then search the shared variable. This article describes the most commonly used cache structure, called the standard cache, and then introduces a more efficient and scalable variant called the ESQL Cache, because it exploits the way ESQL accesses its variables.

Standard cache

The most common cache implementation with ESQL shared variables consists of a shared ROW that contains the result of a SELECT on the database table being cached:

1
DECLARE CACHE SHARED ROW;

Assume that you have a database table called AIRPORTS that contains two columns, CODE and CITY. This code loads the cache:

1
SET CACHE.AIRPORT[] = SELECT A.CODE, A.CITY FROM Database.AIRPORTS AS A;

The contents of the CACHE variable look like this:

1
2
3
4
5
6
7
8
9
10
CACHE.AIRPORT[1].CODE = AAA
CACHE.AIRPORT[1].CITY = Anaa
CACHE.AIRPORT[2].CODE = AAB
CACHE.AIRPORT[2].CITY = Arrabury
...
CACHE.AIRPORT[4225].CODE = LHR
CACHE.AIRPORT[4225].CITY = London
...
CACHE.AIRPORT[9185].CODE = ZZV
CACHE.AIRPORT[9185].CITY = Zanesville

This function implements the cache:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
DECLARE CACHE SHARED ROW;
CREATE PROCEDURE getCity_v01 (IN airportCode CHARACTER) RETURNS CHARACTER
BEGIN
 -- v01: "standard" cache
 -- PERFORMANCE TEST ONLY! No ATOMIC blocks.
 -- Do not use if Additional Instances > 0.
 IF CACHE.AIRPORT.CODE IS NULL THEN
 -- load the cache
 SET CACHE.AIRPORT[] = SELECT A.CODE, A.CITY FROM
 Database.AIRPORTS AS A;
 END IF;
 RETURN THE(SELECT ITEM A.CITY FROM CACHE.AIRPORT[] AS A
 WHERE A.CODE = airportCode);
END;

For readability, loading of the cache is shown without an ATOMIC block, which works only when one thread is running the message flow. There are other ways to search the cache, such as using FOR or WHILE loops, but using SELECT is the fastest.

Performance when using a standard cache

The problem with this cache structure is that it doesn't scale. A user trace shows that SELECT scans the table sequentially until it finds a row that satisfies the WHERE clause. As the table grows, the search gets slower. Eventually, it becomes faster to drop the cache and go to the database each time.

Measurements using IBM Integration Server V9 on Windows 7 64-bit, with a local DB2 V10.1 database show the effect of a growing cache:

Figure 1. Elapsed milliseconds per message by cache size (no cache vs. standard cache)
Graph comparing performance of cache and no cache

The test consisted of putting 10,000 messages to randomly access the first 1000, 2000, 3000 (and so on) rows in the cache, thus simulating different cache sizes. The total size of the cache is 9185 rows. The chart shows the elapsed time, in milliseconds, to process one message. The database response time (about 0.8 milliseconds) is constant (label NO_CACHE).

In this configuration, the cache is faster than the database up to about 3000 rows. For larger cache sizes, the cache is slower than a database access. (Results will vary for other configurations, such as accessing the database remotely, or using in-memory databases.)

ESQL Cache

The ESQL Cache stores each key and value (in our example, airport code and city name) as a name/value pair:

1
2
3
4
5
6
CACHE.AAA = Anaa
CACHE.AAB = Arrabury
...
CACHE.LHR = London
...
CACHE.ZZV = Zanesville

There is no array. To return the city name for a given an airport code, the cache search function simply refers to the appropriate variable:

1
RETURN CACHE.{airportCode};

Here is the function that implements the new cache structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
CREATE PROCEDURE getCity_v02 (IN airportCode CHARACTER) RETURNS CHARACTER
BEGIN
   -- v02: "ESQL" cache  
   -- PERFORMANCE TEST ONLY! No ATOMIC blocks.
   -- Do not use if Additional Instances > 0.
   IF FIELDNAME(CACHE.*[1]) IS NULL THEN
    -- load the cache
    DECLARE TEMPCACHE ROW;
    SET TEMPCACHE.AIRPORT[] = SELECT A.CODE, A.CITY
    FROM Database.AIRPORTS AS A;
        FOR cacheline AS TEMPCACHE.AIRPORT[] DO
        CREATE LASTCHILD OF CACHE   NAME cacheline.CODE
VALUE cacheline.CITY;
        END FOR;
   END IF; 
   RETURN CACHE.{airportCode};
END;

To organize the cache in the new structure, the function navigates each row returned by the database SELECT. In this example, each row contains an airport code (such as JFK) and a city (New York). This pair is used to add a variable to the cache, where the variable name is the code, and its value is the city (CACHE.JFK='New York').

Cache loading is shown without an ATOMIC block for readability. The code above works only when there are no additional threads. Here is the statement to populate a cache row:

1
SET CACHE.{cacheline.CODE} = cacheline.CITY;

SET and CREATE are equivalent if and only if there is only one instance of the key (CODE, in this example). When using SET when there are multiple occurrences of the key, there will be only one instance in the cache (the last one).

Performance with ESQL Cache

Because the search accesses the variable directly, it is much faster and scales better. The chart below compares the response time of the ESQL Cache with the standard cache. Figure 2 shows milliseconds per message for cache sizes up to 9000 rows:

Figure 2: Elapsed milliseconds per message by cache size (ESQL Cache vs. no cache)
Graph comparing performance of ESQL Cache vs. no cache

The ESQL Cache, with an average elapsed time of 0.21 milliseconds per message, is significantly faster than accessing the database for all cache sizes. It is also faster than the standard cache, as shown in Figure 3:

Figure 3. Elapsed milliseconds per message by cache size (ESQL Cache vs. standard cache
graph comparing performance of ESQL Cache vs. standard cache

The ESQL Cache is faster than the standard cache for all cache sizes. The chart scale hides the fact that the ESQL Cache elapsed time per message is not really constant, but grows, albeit very slowly. This growth is evident in the chart in Figure 4, which shows the ESQL Cache only:

figure 4: Elapsed milliseconds per message by cache size (ESQL Cache)
Graph showing performance impact of cache size

The response time grows from about 0.20 milliseconds per message for a cache size of 1000 entries, to 0.23 milliseconds per message for a cache of 9000 entries, an increase of approximately 15%. When does the ESQL Cache become slower than a database access? Although the authors did not verify this experimentally, a linear projection indicates that for the configuration under test, the ESQL Cache remains faster than a database access up to about 80,000 entries.

Using multiple keys

The ESQL Cache can easily support multiple keys, to satisfy the equivalent of SELECT ... WHERE KEY1=value1 AND KEY2=value2. Here is the structure:

1
2
3
CACHE.aa.aa = xxx
CACHE.aa.bb = yyy
CACHE.xx.yy = zzz

To return a value:

1
RETURN CACHE.{key1}.{key2};

Comparison between ESQL Cache and Global Cache

WebSphere Message Broker V8 introduced the Global Cache. It uses the execution group's JVM heap to store data, and provides Java APIs to put to and get from the cache. It is easy to implement, and provides consistent performance across a range of cache sizes. One advantage of the Global Cache over ESQL shared variables is that the cache can be shared between message flows, integration servers, execution groups, and integration buses or message brokers, while the scope of ESQL shared variables is the message flow. Figure 5 compares the Global Cache with the ESQL Cache:

Figure 5. Elapsed milliseconds per message by cache size (ESQL Cache vs. Global Cache)
Graph of ESQL Cache vs. Global Cache

The ESQL Cache, at 0.21 milliseconds per message, is faster than the Global Cache (average of 0.43 milliseconds per message) for all cache sizes tested. However, for larger sizes, the Global Cache will be faster. The same linear projection used earlier indicates that the ESQL Cache becomes slower than the Global Cache when cache size reaches about 40,000 entries.

Conclusion

The proposed cache structure results in a significant performance improvement for caching with ESQL shared variables. The logic to implement the new cache is very simple, so converting existing standard cache structures is straightforward.

We have not measured cache sizes beyond 9000 rows, but the trend indicates that a cache of 80,000 entries could still be faster than accessing the database. The precise cutoff point will depend on your specific hardware and software configurations, but in any case, the ESQL Cache offers a significant improvement over the current cutoff point of a few thousand entries.

For larger cache sizes, consider using the IBM Integration Bus Global Cache, which provides consistently good performance and is easy to implement. If you need to share the cache across message flows, integration servers, execution groups, or integration buses / message brokers, then the Global Cache is the only option, since the scope of ESQL shared variables is the message flow.

Acknowledgments

The authors would like to thank Amanda Erlank, from Cape Town in South Africa, and Johannes Wagener, from Standard Bank in South Africa, for reviewing initial drafts of this article.

Statistics
0 Favorited
4 Views
0 Files
0 Shares
0 Downloads