Co-authors: Paul Cheeseman, Swathi Kalahastri, Sharanabasava .
Introduction
IBM Semeru Runtimes are built with OpenJDK class libraries and the Eclipse OpenJ9 JVM, which includes a state-of-the-art Just-In-Time (JIT) compiler. The IBM Semeru JDK is generally considered as a drop-in replacement for any OpenJDK-based Java distribution, providing a no-cost Java runtime environment optimised for performance with broad platform support.
This article explains the compilation phases of the OpenJ9 JIT compiler, such as intermediate language generation, code optimisation, and code generation. It also demonstrates how a Java application's performance improves over time as interpreted bytecode is replaced with optimised native code by the JIT compiler.
The OpenJ9 Just-In-Time (JIT) compiler in the IBM Semeru JDK plays an important role in increasing the performance of Java applications by dynamically compiling and optimising Java bytecode into machine code at runtime.
IBM Semeru Runtimes can be downloaded for free, even for production use, from ibm.biz/GetSemeru.
What is a Compiler?
A compiler is a translator which converts code written in one source language to another target language without changing the meaning of the code.
OpenJ9 JIT Compiler
The OpenJ9 Just-In-Time (JIT) compiler is a dynamic compiler which converts Java bytecode to native machine code at runtime. It is an optimising compiler which applies advanced techniques to produce code that runs fast without significant compilation overheads, thus increasing the overall performance of the OpenJ9 Java virtual machine (JVM).
Compilation Process
Compiler
Most compilers have three main parts to convert source code to native machine code.
-
Front end -> Lexing, Parsing, Semantic analysis
-
Lexing -> converts the source code into tokens such as keywords, identifiers etc.
-
Parsing -> converts tokens into abstract syntax trees
-
Semantic analysis -> traverses the abstract syntax trees and generates intermediate language
-
Middle end -> Code optimisation
-
Back end -> Code generation
OpenJ9 JIT Compiler
OpenJ9 JIT compiler has three main parts to convert Java bytecode to native machine code.
During OpenJ9 JIT compilation, lexing, parsing, and semantic analysis are not required because javac has already completed these steps during the conversion of Java source code to Java bytecode (see below).
-
Front end -> intermediate language generation
-
Compilation Control and Optimization -> code optimisation without excessive resource use
-
Back end -> code generation
The below diagram represents OpenJ9 JIT compilation phases.

Example
Java Source Code
Let’s take some Java source code and walk through its conversion to bytecode and native machine codes with javac and the OpenJ9 JIT compiler.
a = a + b;
a = (a - b) * 2;
Java Bytecode
The Java compiler (javac) converts Java source code into Java bytecode which is platform independent and runnable on any compatible Java virtual machine.
iload a
iload b
iadd
istore a
iload a
iload b
isub
bipush 2
imul
istore a
Initially, the Java virtual machine uses a bytecode interpreter to execute each Java bytecode individually. The interpreter is essentially a large switch statement, where each case block implements a specific bytecode. This mode of execution is reliable, relatively simple, and requires no additional compilation and is the quickest way to port Java and run it on a new platform, but it does not maximise performance.
When a Java program is running, OpenJ9 records the number of times each method in the bytecodes is interpreted and when the count reaches a certain threshold, indicating it has become “hot”, the relevant section of bytecode will be compiled and replaced with optimised (but functionally equivalent) machine code generated by the OpenJ9 JIT compiler.
OpenJ9 initially executes the bytecodes contained in Java methods in the interpreter mode in order to gain faster startup time for Java applications. This is because JIT compiling a method is a resource intensive task, and it should only be done for the parts of the program that are frequently executed. Once a given method’s bytecodes are deemed to be “hot” (or frequently executed), OpenJ9 will then compile bytecodes and execute the resulting machine code to gain faster throughput performance. In certain cases, OpenJ9 may choose to recompile the code for a heavily executed compiled method with more aggressive levels of optimizations to gain further performance.
Interpreter vs JIT compiler
The below example demonstrates the warm up period in Java application, where initial run was slower due to interpretation, followed by performance improvements once the JIT compiler optimises the frequently used bytecode to machine code.
Sample Code
import java.io.*;
import java.lang.Thread;
public class JITCode {
// ensuring that results get stored somewhere is a safeguard against the compiler optimizing away entire loops
public static int visible=0;
public static void main(String[] args) throws Exception
{
final int ITERATIONS=100000;
System.out.println("warmup phase");
// Perform a task to trigger JIT compilation
for (int i = 0; i < ITERATIONS; i++) {
visible += calculateAdd(i);
}
System.out.println("Sleeping to allow compiler to finish");
Thread.sleep(5000);
System.out.println("timing "+ITERATIONS+" iterations");
// Run the same task to observe JIT optimization
long startTime = System.nanoTime();
for (int i = 0; i < ITERATIONS; i++) {
visible += calculateAdd(i);
}
long endTime = System.nanoTime();
System.out.println("Time taken: " + (endTime - startTime) / 1_000_000 + " ms");
}
private static long calculateAdd(int max) {
long add = 0;
for (int i = 0; i < max; i++) {
add += i;
}
return add;
}
}
The below Java commands and options are used to execute the sample code in interpreter mode and JIT compiled mode.
// Interpreter mode
java -Xint JITCode
//JIT compiler mode, however inlining opt level is disabled
java '-Xjit:verbose={compileperf*},vlog=jit,dontInline={JITCode.calculateAdd(*}' JITCode
Output
Before the method has been JIT compiled, it was executed by bytecode interpreter and total time taken was 33240 ms. Due to the frequent use of “calculateAdd” method, it was eligible for JIT compilation and converted to optimised machine code. After the method is JIT compiled, total time taken for execution is 1337 ms.
Time taken in interpreter mode: 33240 ms
Time taken in JIT compiler mode: 1337 ms
The “calculateAdd” method has been JIT compiled after the invocation count reached a pre-defined threshold.
Intermediate Language Generation
The first phase of OpenJ9 JIT compilation is intermediate language generation. Java bytecodes are not optimal for sophisticated analysis, so it is translated to an intermediate language, for easier manipulation and analysis.
Intermediate Language Trees
OpenJ9 JIT translates the Java bytecode into intermediate language trees. The trees are connected to each other using linked list as shown in the below example.

Control Flow Graph
The linked list is divided into basic blocks and the execution path between two blocks is known as a control edge. Basic blocks have one entry point and one exit point. Blocks may represent loops and combinations of basic blocks, control edges and loops form a data structure known as a Control Flow Graph (CFG).

In a simplistic sense, the union of intermediate language trees and control flow graphs and other cached analysis information together comprise the intermediate language. There are some other cached data structures that may also be considered part of the intermediate language, but those are outside the scope of this article.
Code Optimisation
The OpenJ9 JIT compiler uses many optimisation techniques to analyse and transform the intermediate language. The following are some selected examples from the hundreds of optimisations that take place.
Local Optimisation
Constant Folding -> expressions consist of only constants, are replaced with computed values during compilation.
int x = 2 + 3; // Non Optimised Code
int x = 5; // Optimised Code
int y = 2 * 3; // Non Optimised Code
int y = 6; // Optimised Code
Strength Reduction -> computationally expensive operations will be replaced with equivalent but less expensive operations.
int y = x * 2; // Non Optimised Code
int y = x << 1; // Optimised Code
int b = a / 2; // Non Optimised Code
int b = a >> 1; // Optimised Code
Common subexpression elimination -> redundant computations are identified and removed by reusing previously calculated results.
int a = b * c + g; // Non Optimised Code
int d = b * c * e;
int tmp = b * c; // Optimised Code
int a = tmp + g;
int d = tmp * e;
Global Optimisation
Dead code elimination -> identifies and removes dead code which doesn’t contribute to the program’s final output.
Non optimised code
-------------------
int x = 10;
if (false)
System.out.println("test"); // dead code
return x;
Optimised code
---------------
int x = 10;
return x;
Dead store elimination -> identifies and removes variables which are assigned but never actually used before being overwritten or going out of scope.
Non optimised code
-------------------
int x = 7; // dead store
x = 12;
int y = x + 2;
Optimised code
---------------
int x = 12;
int y = x + 2;
Global register assignment -> minimises memory accesses by keeping frequently used variables in registers for extended periods, which increases execution speed.
Loop Optimisation
Induction variable analysis -> identifies and analyses induction variables that change in a predictable way within a loop.
for (i = 0; i < n; i++) {} // i is an induction variable
Loop invariant code motion -> identifies computations that produce the same value in every iteration of a loop, and moves them outside of the loop’s body so they only need to be executed once.
Non Optimised Code
-------------------
int result = 0, x = 10, y = 5;
for (int i = 0; i < 1000000; i++) {
result += (x * y); // (x * y) is loop-invariant
}
Optimised Code
---------------
int result = 0, x = 10, y = 5;
int temp = x * y; // Loop-invariant computation moved outside
for (int i = 0; i < 1000000; i++) {
result += temp;
}
Loop unrolling -> removes or reduces iterations in a loop, which increases program speed by eliminating loop control and test instructions.
Non Optimised Code
-------------------
for (int i=0; i<3; i++)
System.out.println("Hello\n"); //print hello 3 times
loop unrolling
---------------
System.out.println("Hello\n");
System.out.println("Hello\n");
System.out.println("Hello\n");
Vectorisation -> converts scalar operations into vectorised or SIMD operations, allowing multiple data elements to be processed simultaneously.
Scalar code
------------
for (int i = 0; i < n; i++) {
a[i] = b[i] + c[i];
}
Vectorized code
----------------
for (int i = 0; i < n; i += 4) {
// Load 4 elements of b, c, and a into SIMD registers
// Perform addition on the registers (4 additions at once)
// Store the 4 results back into a
}
Loop versioning -> Creates multiple versions of a loop to optimise different runtime conditions, such as potential exceptions. A conditional check is inserted before the loop to determine which version to execute. This helps to address the significant challenge of handling exceptions efficiently within highly optimised loops.
Non Optimised Code
-------------------
void copy_array(int dst[], int src[], int size) {
for (int i = 0; i < size; ++i) {
dst[i] = src[i];
}
}
Loop versioning
----------------
void copy_array_versioned(int dst[], int src[], int size, int dst_size, int src_size) {
// Runtime check If the loop range is safe, jump to the fast version.
if (size <= dst_size && size <= src_size) {
for (int i = 0; i < size; ++i) {
// Omit performing bound checks for both arrays in this version
dst[i] = src[i];
}
} else {
// This version performs explicit boundary checks in every iteration.
for (int i = 0; i < size; ++i) {
// Perform bound checks for both arrays in this version
dst[i] = src[i];
}
}
}
Loop specialisation -> Creates multiple versions of the loop body. If a loop contains conditional statements whose conditions depend on loop-invariant expressions, the compiler can generate separate, specialised loops for each possible outcome of that condition. This eliminates the need to evaluate the invariant condition repeatedly inside the loop.
Non Optimised Code
-------------------
int N = 100;
int K = 5;
for (int i = 0; i < N; i++) {
bool condition = (K > 3); // loop-invariant expression
if (condition) {
// for K > 3
} else {
// for K <= 3
}
}
Loop Specialisation
--------------------
int N = 100;
int K = 5;
bool condition = (K > 3); // Loop-invariant condition
if (condition) {
for (int i = 0; i < N; i++) {
// Specialized loop body for K > 3
}
} else {
for (int i = 0; i < N; i++) {
// Specialized loop body for K <= 3
}
}
Inlining -> combines frequent method calls by inserting a method’s code into its caller, and replacing these two (or more) calls with a single method call.
Non Optimised Code
------------------
int square(int x) {
return x * x;
}
int main() {
int num = 5;
int result = square(num); // Function call
return 0;
}
Optimised Code
---------------
int main() {
int num = 5;
int result = num * num; // Inlining
return 0;
}
Code Generation
Code generation is the final phase of the compilation process, where OpenJ9 JIT compiler transforms the intermediate language into machine instructions and ultimately, binary machine code. This code is platform dependent and takes into account the characteristics of the specific hardware architecture. The generation of the binary machine code from the intermediate language involves many critical tasks.
Instruction selection -> choose which assembly instructions to use when implementing operations specified in the intermediate language
Register allocation -> decide which intermediate results and variables will reside in CPU registers to minimizememory accesses and improve execution speed
Encode instructions in binary -> convert assembly instructions to a binary representation suitable for the target CPU architecture
mov R1, [a]
mov R2, [b]
add R3 = R1 + R2
mov [a], R3
mov R4 = R1 - R2
mov R5, 2
mul R6 = R4 * R5
mov [a], R6
The OpenJ9 JIT compiler supports optimised code generation for IBM Z and POWER architectures as well as x86 and Aarch64 architectures.
Conclusion
This article explains the compilation phases of the OpenJ9 JIT compiler, such as intermediate language generation, code optimisation, and code generation. It also demonstrates how a Java application's performance improves over time as interpreted bytecode is replaced with optimised native code by the JIT compiler.
OpenJ9 JIT compiler improves the performance of the Java applications by converting frequently used sections (methods) of Java bytecode to native machine code at runtime. This dynamic optimisation allows developers to focus on writing clear application code without compromising on performance.
IBM Semeru Runtimes can be downloaded for free, even for production use, from ibm.biz/GetSemeru.