As discussed in a past blog post
, if you're experiencing performance problems, after you've reviewed that garbage collection (GC) is healthy
, and you've reviewed thread dumps
, the next step is to review a sampling profiler. This gives you more detailed information about what's using CPU and other non-CPU insights.
The steps to use a sampling profiler are:
- Enable the sampling profiler during the performance issue
- Install the profiling client
- Analyze method profiling
- Analyze locking
- Analyze stack samples
Step 1: Enable Sampling Profiler
There are various sampling profilers for Java but we will focus on IBM Monitoring and Diagnostic Tools - Health Center
. This is primarily available for IBM Java 8 and it is shipped with IBM Java 8, so nothing additional needs to be installed to enable the profiler. In general, the overhead of Health Center is less than ~2% and acceptable for production use.
There are a few different ways to enable Health Center to produce HCD files for analysis (Health Center also has an option for live monitoring over a socket):
- At startup and until Java is gracefully stopped
- At startup for a fixed duration
- At runtime and until Java is gracefully stopped
- At runtime for a fixed duration
For example, using option 1 and enabling Health Center at startup simply involves restarting Java with:
Step 2: Install the Health Center client
The Health Center client
is available for Windows, macOS, and Linux. Launch the client and then load the HCD files gathered in Step 1.
In general, wait for the progress bar in the bottom right of the client to complete before analyzing data. Health Center may use a lot of memory in the client, so ensure you set a large -Xmx for the Health Center client depending on available RAM on your machine.
Upon loading the data, Health Center shows potential warnings and suggestions for each dimension of data:
A good place to start is the CPU view. You may draw a box over a time range to zoom into a particular time period and the rest of the Health Center views will zoom into the same period as well.
Step 3: Analyze Method Profiling
Click on the Method Profiling view. By default, it is sorted by Self % which is the percent of samples using CPU that were executing in that method. In general, if a method has a Self % greater than 2% or so, then it may be a hot spot. Click on any such methods and click on the Invocation Paths view to review the stacks of who is calling this method to determine areas for potential optimization. There is an option in the Health Center preferences to show full package names of methods.
Next, sort by Tree%. Usually the first one will be something like Thread.run or Worker.run. Select this and change to the Called methods view. Expand the largest tree items until there is a large "drop;" for example, if methods are 100, 99, 100, 100, etc., and then suddenly there is a drop to one method with 60% and one with 40%, this is usually an indication of a major divergence in general application activity. Continue as needed until something interesting comes up (this is an art more than a science).
Step 4: Analyze Locking
Click on the Locking view. Sort by the Slow column. If the count of slow accesses multiplied by the average hold time (in nanoseconds) is concerning, then review the class of the lock object and consider it a suspect of a bottleneck.
Step 5: Analyze Stack Samples
Health Center is geared towards analyzing what's using the CPU; however, performance problems may arise as bottlenecks waiting and not using the CPU. For example, if the Java process is waiting on I/O such as waiting on a database or web service. For this reason, Health Center gathers thread dumps every 30 seconds. These thread dumps may be extracted into a thread dump file by clicking File } Export Threads. This file may be loaded into the IBM Thread and Monitor Dump Analyzer as discussed in a previous post
In case lock contention was observed in the previous step, these thread dumps may also lead to the cause of what is driving the lock contention.
In conclusion, a sampling profiler is a way to deep dive into what's using the CPU and other non-CPU performance insights. Tools like Health Center are advanced tools that take some time to become familiar with but they provide very rich data. If garbage collection is healthy and gathering a handful of thread dumps didn't provide sufficient insight into the cause of a performance problem, then a sampling profiler like Health Center may help provider richer insight into the causes.#Java