As discussed in a past blog post, if you're experiencing performance problems, after you've reviewed that garbage collection (GC) is healthy, the next step is to review thread dump stacks during a performance issue. This gives you a sample of what your application is doing and often leads to the cause of performance issues.
The steps to review thread dump stacks are:
- Gather thread dumps during a performance problem
- Review the thread dumps in a text editor or in a tool such as the free IBM Thread and Monitor Dump Analyzer
- Look for lock contention
- Look for common patterns of stacks of active threads
Step 1: Gather thread dumps
Gathering thread dumps is relatively easy and cheap (in the sense of risk and latency).
On POSIX-based operating systems such as Linux, AIX, IBM i, Solaris, HP-UX, etc., gathering a thread dump is as simple as sending the SIGQUIT (3) signal to the process (based on the process ID, or PID):
kill -3 $PID
Although the 3 signal has "QUIT" in its name, don't be afraid because, by default, the Java process handles the signal, produces the thread dump, and continues running, so the signal is non-destructive (the exception is if you've disabled the signal handler with an option such as
-Xrs although this is very rare).
While the thread dump is produced, the process is paused, so this does have an impact on end users; however, in general, thread dumps are produced in no more than about 100 milliseconds. The major exception to this is when customers have re-configured the signal handler to produce other diagnostics on SIGQUIT such as heapdumps. There are other ways to request heapdumps and therefore we do not recommend this sort of configuration because then the main benefit of thread dumps is lost as they're no longer lightweight.
On Windows, it's possible to set up something like a SIGQUIT handler (using Ctrl+Break) but this requires an attached console and thus is rarely used. The alternative is to use the WAS traditional Administrative Console or JVM.dumpThreads MBean, the WebSphere Liberty server javadump command, the Java Surgery JavaDump command, the OpenJ9 jcmd Dump.java command, the HotSpot jcmd Thread.print command, and various other mechanisms. These alternative mechanisms generally apply to the other operating systems as well.
In addition, on z/OS, you may also use the MODIFY JAVACORE command, and, on IBM i, WRKJVMJOB.
On IBM Java/OpenJ9/Semeru, the thread dump is produced in a javacore*.txt file, generally in the current working directory of the process. In general, each thread dump is a few megabytes in size.
On HotSpot Java, the thread dump is appended to stderr. In general, each thread dump is a few hundred kilobytes in size.
As with any diagnostic, there are always risks, however small, of problems. There have been bugs in the past where requesting thread dumps could lock up or crash JVMs, but those have been fixed and such risks are exceedingly rare. We regularly use thread dumps in production on massive customer systems.
Step 2: Review the thread dumps
Although you may review thread dumps by hand using a text editor (for J9 thread dumps, look for 3XMTHREADINFO and 1LKMONPOOLDUMP), we recommend using a graphical tool such as the free IBM Thread and Monitor Dump Analyzer (TMDA). Load the javacore*txt files or stderr and you will see a list of thread dumps.
Step 3: Look for lock contention
If TMDA shows any non-zero values in the "Monitor Contention" column, then there may be lock contention. Click each such row and then click on the "Monitor Detail" button:
In some cases, TMDA shows phantom lock contention (e.g. "Unknown") which is not a concern. A concern is when you see application thread stacks performing work and blocked on a lock. For example:
Select a blocked thread to see the thread stack and the lock that it is waiting on. In the above example, the thread is waiting on an Apache Derby LogToFile lock which is owned by the "Default Executor-thread-135" thread that is the parent element in the thread selection box. In this example, it is common that Apache Derby is waiting on a filesystem under load. If this were a real application, we would normally recommend moving to a more robust database instead, or getting faster disks.
It's also worth noting that the application stack shows the actual application activity. In this example, the TradeDirect.buy method call occurs when a user performs a stock purchase. It may be valuable to review the higher-level application behavior that led to the lock to understand ways to avoid the bottleneck.
Step 4: Look for common patterns of stacks
Finally, if there is no or limited lock contention, the primary purpose of thread dumps is to review the pattern of thread stacks in application thread pools. Select a thread dump and click on the "Thread Detail" button:
Next, sort by the "Stack Depth" column in descending order:
In general, thread stacks with about 15 or more stack frames are performing actual work. Click on each such thread and review the resulting stack on the right:
Review the top few frames of the stack and also look for signs higher up in the stack of what the application is doing. In the above example, the method TradeServletAction.doQuotes shows that the user was requesting a stock price quote. Over time, you will build knowledge about these stacks based on each application and understand what activity is happening inside the process.
Each thread stack is just one sample. Statistically, the more samples you have, the more confident you can be about concluding what was happening inside Java during a performance problem. If you observe common patterns in stacks or the driving application stack frames, then that is the most likely proximate cause of the performance problem. The more thread dumps you have, the more confident you can be. In general, we often recommend taking multiple thread dumps about 15 seconds apart during a performance issues. This is automated in scripts such as the WAS Performance MustGather (Linux, Windows, AIX, z/OS, Solaris, HP-UX).