When you want to understand what's consuming your Java heap, the classic method is to gather a heapdump or system dump and review it with a tool such as the Eclipse Memory Analyzer Tool (MAT)
with the IBM DTFJ extension (to read dumps produced by IBM Java and Semeru).
However, creating heapdumps and system dumps is expensive. They will pause the JVM for up to dozens of seconds, they are very large files, and processing them takes significant time. In a previous post, we discussed how to
understand Java heap allocations using an object allocation sampler but that technique is a bit cumbersome and it's complicated to figure out the right sampling rate.
In this post, we'll show how to use the class histogram dump as an alternative. This is generally much lighter weight compared to full memory dumps and has more comprehensive information compared to the allocation sampler.
Gathering a Class Histogram Dump
If you're running
IBM Semeru or a
HotSpot JVM, then you may use the
jcmd
utility to gather a class histogram dump by replacing
$PID
with the process ID:
$ jcmd $PID GC.class_histogram
num object count total size class name
-------------------------------------------------
[...]
For IBM Java >= 8.0.6.0, this may be executed as follows by replacing
$JAVA_HOME
with the IBM Java home directory, and
$PID
with the process ID:
$ java -Xbootclasspath/a:$JAVA_HOME/lib/tools.jar openj9.tools.attach.diagnostics.tools.Jcmd $PID GC.class_histogram
Note that the overhead of producing a class histogram is proportional to the number of classes and objects loaded, and the JVM will be paused during the dump, so there may still be a long pause to create the dump and this should be tested in a test environment first.
In addition,
producing the dump does not first run a full garbage collection, so what's dumped may include a lot of trash if a generational garbage collector is in use. For more accurate heap usage, first run a full garbage collection, wait for that to finish (e.g. by tailing verbosegc), and then run the class histogram dump. The jcmd utility has the command
GC.run
to request a full garbage collection (which works as long as the JVM does not have the
-Xdisableexplicitgc
or
-XX:+DisableExplicitGC
options):
$ jcmd $PID GC.run
Finally, it's usually good to write jcmd
output to a file as it can be very lengthy:
$ jcmd $PID GC.class_histogram > diag_jcmd_$(date +%Y%m%d_%H%M%S).txt
Interpreting a Class Histogram Dump
By default, the class histogram dump is sorted by the "total size" column; for example, here are the top 15 items of a sample application (and the last total line):
$ jcmd 1744 GC.class_histogram > diag_jcmd_$(date +%Y%m%d_%H%M%S).txt; cat diag_jcmd*
num object count total size class name
-------------------------------------------------
1 312324 14991552 [B
2 78492 6279360 [Ljava.lang.Object;
3 258826 4141216 java.lang.String
4 161222 3869328 java.util.HashMap$Node
5 43372 3469760 [Ljava.util.HashMap$Node;
6 57369 2294760 java.util.HashMap
7 1 2097168 [Lcom.ibm.ws.sib.msgstore.cache.links.AbstractItemLink;
8 20159 2096536 java.lang.Class
9 74902 1797648 java.util.ArrayList
10 66804 1603296 java.util.concurrent.ConcurrentHashMap$Node
11 14060 1237280 java.lang.reflect.Method
12 22175 1064400 [C
13 26850 859200 org.apache.derby.impl.store.raw.data.StoredRecordHeader
14 23399 748768 [Ljava.lang.Class;
15 29381 705144 org.apache.derby.impl.store.raw.data.RecordId
[...]
Total 2263229 84116600
Descriptions of the columns:
- num: Simply the number of the row. Looking at the last number tells you how many classes are loaded.
- object count: The number of objects of that class.
- total size: The sum of shallow heap sizes of all objects of that class and the class itself.
- class name: The Java class name (a
[
prefix denotes an array)
Finally, the "Total" line provides the sum of the number of objects and the total heap usage (in the above example, 2,263,229 objects using 80.2MB).
The critical column is "total size" and its concept of the shallow heap size is
described in more detail in the Eclipse MAT help. The short summary is that the shallow heap size is simply the size of the raw object itself plus any primitive fields.
Normally, we're more interested in the "retained heap size" which is basically the amount of Java heap that would be garbage collected if a particular object were to be garbage collected. Retained heap provides a good approximation for our intuitions about an object's "actual size". However, the "retained heap size" is not available in a class histogram dump. This is because MAT has to do some very complicated graph theory computation to calculate retained heap sizes (and this is the main reason why loading heapdumps and core dumps takes so long).
However, the sum of shallow heap sizes per class is still a very useful indicator of Java heap usage. Normally, it's safe to infer that the first few non-JDK classes (i.e. non-
java.*
classes) with large "total sizes" are likely to be what are driving the Java heap usage. In the above example, those would be the arrays of
com.ibm.ws.sib.msgstore.cache.links.AbstractItemLink
and the instances of
org.apache.derby.impl.store.raw.data.*
.
Summary
In summary, class histogram dumps are somewhere in between full heapdump analysis and object allocation sampling. They provide a relatively lightweight way to get a feeling of what is driving Java heap usage. However, they do still carry some non-trivial cost, so it's always best to test the overhead in a test environment first.
#automation-portfolio-specialists-app-platform#Java#WebSphere#WebSphereApplicationServer(WAS)#WebSphereLiberty