WebSphere Application Server & Liberty

Lessons from the field #6: IBM Java and OpenJ9 Just-In-Time Compiler Tuning

By Kevin Grigorenko posted Wed June 30, 2021 10:48 AM

  

One of the key values of Java is the Just-In-Time (JIT) compiler. This converts the hottest Java methods from interpreted code into native code at runtime. The performance difference is massive: typically, 10-20 times faster (i.e. a magnitude). The JIT is enabled by default, and, in general, we do not recommend disabling it.

Some other modern programming languages offer a JIT, but few are as advanced as the JIT in Java. Many computer science PhDs and engineers have spent years fine-tuning the Java JIT (see some examples). The Java JIT is one of the main draws of Java as compared to other programming languages.

In many cases, you don’t need to tune the Java JIT. This article covers some of the major tuning considerations for the IBM Java and OpenJ9 Java (both hereafter referred to as J9 Java) JIT that may apply to some large-scale production workloads.

JIT Code Cache

The JIT Code Cache is an area in native memory (i.e. inside the Java process but outside the Java heap) where compiled code is stored. The JIT Data Cache is another area of native memory that stores metadata about the compiled code.

The maximum size of the code cache is controlled by the -Xcodecachetotal option. As of this writing, the default value of this in recent versions of 64-bit J9 Java is 256MB:

-Xcodecachetotal256m

This value will be rounded up to a multiple of the code cache block size (see -Xcodecache).

The -Xcodecachetotal page notes that performance may degrade for complex applications in which the code cache fills up:

Long-running, complex, server-type applications can fill the JIT code cache, which can cause performance problems because not all of the important methods can be JIT-compiled.

This is because the JIT Code Cache is not a Least-Recently-Used (LRU) list. Compiled methods may only be removed for a narrow set of reasons (e.g. class unloading, agent retransformation, etc.) and once the cache fills up, it will stay that way until the JVM is restarted. However, before you run off and increase your code cache size, keep these points in mind:

  1. An excessive code cache size may have negative consequences. The longer the JVM runs, the more likely the JIT is to generate code at higher optimization levels if there's space in the cache. The higher optimization compilations produce much bigger compiled method bodies (typically because of additional inlining). This can impact the instruction cache which may reduce performance. So, ideally, you want the JIT to compile just the “right” set of methods at “appropriate” optimization levels and then stop. There isn’t any way of knowing when that has happened, so if the code cache is set very big it will likely just keep going into negative territory. In addition, it takes a long time to compile at the higher optimization levels, and that time spent on the compiling can be a negative itself.
  2. The code cache uses native memory outside of your Java heap, so ensure there is sufficient physical memory (RAM) to avoid swapping.
  3. Ensure that there are no classloader leaks that are driving excessive code cache usage. If needed, gather a system dump and review using the Eclipse Memory Analyzer Tool (MAT).
In other words, it is common for the JIT code cache to fill up in large production workloads, and this may be optimal. There are cases when a larger code cache size is better but ensure you monitor tests of such larger values over a long period of time (e.g. until the larger code cache fills up).

The easiest way to check how large the code cache is at runtime is by requesting a thread dump and reviewing the resulting javacore*.txt file. The sizes of both caches are the “Total memory in use” values for their respective 1STSEGTYPE sections:
1STSEGTYPE     JIT Code Cache
1STSEGTOTAL    Total memory:                   134217728 (0x0000000008000000)
1STSEGINUSE    Total memory in use:            121952439 (0x000000000744D8B7)
1STSEGFREE     Total memory free:               12265289 (0x0000000000BB2749)
1STSEGLIMIT    Allocation limit:               134217728 (0x0000000008000000)

1STSEGTYPE     JIT Data Cache
1STSEGTOTAL    Total memory:                    71303168 (0x0000000004400000)
1STSEGINUSE    Total memory in use:             71303168 (0x0000000004400000)
1STSEGFREE     Total memory free:                      0 (0x0000000000000000)
1STSEGLIMIT    Allocation limit:               402653184 (0x0000000018000000)

Sometimes, it may be worth selectively disabling or reducing the optimization of certain large method bodies that aren’t important.

If the JIT code cache is exhausted (as seen in verbose JIT logging with -Xjit:verbose={compileStart|compileEnd|compilePerformance},vlog=jitlog) but there is significant free space seen in the javacores, then you may try to reduce fragmentation by increasing -Xcodecache up to 32MB although this will increase the runtime footprint of the process (and there is some fragmentation that cannot be eliminated by design):

-Xcodecache32m

JIT CPU Usage

The JIT needs to observe Java processing to decide what are the hot methods worth compiling, and the JIT needs to perform this compilation. Both of these consume CPU and native memory. The former is done by the JIT sampling thread and the overhead is negligible since it’s using statistical sampling.

The latter is done by JIT compilation threads. As of this writing, in recent versions of J9 Java, the default number of JIT compilation threads on non-Linux operating systems is the number of CPUs minus 1 but no less than 1 and no more than 7. On Linux, 7 threads are created although only the number of CPUs minus 1 are activated initially; if JIT compilation starvation is detected, additional threads up to 7 may be activated. This setting may be explicitly set to a number between 1 and 7:

-XcompilationThreadsX

As with options like -Xgcthreads, if you are vertically stacking many JVMs within the same host and especially if you’re starting many JVMs at once, you may need to reduce compilation threads to avoid CPU exhaustion, although there are other cases in which the opposite is true.

In general, JIT compilation CPU activity is high during “startup” and reduces over time. If you are running very short-lived benchmarks, you may consider the -Xquickstart option although this is not recommended for long-lived processes and may impact throughput by up to 40%.

The most common way to reduce CPU usage of JIT compilation on everything but the first run of a process (for the same code) is by using the shared class cache (SCC) and Ahead-Of-Time (AOT) compilation. AOT allows Java to largely re-use compiled code from previous runs. The shared class cache is enabled with -Xshareclasses and its maximum size is specified with -Xscmx. However, note that the relative performance should be tested:

AOT code quality does not match the JIT code quality, and the recompilation mechanism is subdued significantly such that many AOT bodies do not get recompiled.

The SCC uses memory mapped files which live outside the Java heap and in native memory so ensure you have sufficient available RAM for the SCC.

As an example, here are some aggressive SCC and AOT options that you may consider testing:

-Xshareclasses:name=shr -Xscmx400M -Xjit:dontDowngradeToCold,useHigherMethodCounts,forceAOT -Xaot:dontDowngradeToCold,useHigherMethodCounts,forceAOT

See our team's previous post in the Lessons from the field series: Monitor and keep your WebSphere environments running smoothly

 #app-platform-swat #WebSphereApplicationServer  #WebSphere #WebSphereLiberty  #Java

​​​​
0 comments
73 views

Permalink