WebSphere Application Server & Liberty

 View Only

Lessons from the Field #22: Lightweight Java heap information

By Kevin Grigorenko posted Tue October 18, 2022 08:00 AM

When you want to understand what's consuming your Java heap, the classic method is to gather a heapdump or system dump and review it with a tool such as the Eclipse Memory Analyzer Tool (MAT) with the IBM DTFJ extension (to read dumps produced by IBM Java and Semeru).

However, creating heapdumps and system dumps is expensive. They will pause the JVM for up to dozens of seconds, they are very large files, and processing them takes significant time. In a previous post, we discussed how to understand Java heap allocations using an object allocation sampler but that technique is a bit cumbersome and it's complicated to figure out the right sampling rate.

In this post, we'll show how to use the class histogram dump as an alternative. This is generally much lighter weight compared to full memory dumps and has more comprehensive information compared to the allocation sampler.

Gathering a Class Histogram Dump

If you're running IBM Semeru or a HotSpot JVM, then you may use the jcmd utility to gather a class histogram dump by replacing $PID with the process ID:

$ jcmd $PID GC.class_histogram
  num   object count     total size    class name

For IBM Java >=, this may be executed as follows by replacing $JAVA_HOME with the IBM Java home directory, and $PID with the process ID:

$ java -Xbootclasspath/a:$JAVA_HOME/lib/tools.jar openj9.tools.attach.diagnostics.tools.Jcmd $PID GC.class_histogram

Note that the overhead of producing a class histogram is proportional to the number of classes and objects loaded, and the JVM will be paused during the dump, so there may still be a long pause to create the dump and this should be tested in a test environment first.

In addition, producing the dump does not first run a full garbage collection, so what's dumped may include a lot of trash if a generational garbage collector is in use. For more accurate heap usage, first run a full garbage collection, wait for that to finish (e.g. by tailing verbosegc), and then run the class histogram dump. The jcmd utility has the command GC.run to request a full garbage collection (which works as long as the JVM does not have the -Xdisableexplicitgc or -XX:+DisableExplicitGC options):

$ jcmd $PID GC.run​

Finally, it's usually good to write jcmd output to a file as it can be very lengthy:

$ jcmd $PID GC.class_histogram > diag_jcmd_$(date +%Y%m%d_%H%M%S).txt

Interpreting a Class Histogram Dump

By default, the class histogram dump is sorted by the "total size" column; for example, here are the top 15 items of a sample application (and the last total line):

$ jcmd 1744 GC.class_histogram > diag_jcmd_$(date +%Y%m%d_%H%M%S).txt; cat diag_jcmd*
  num   object count     total size    class name
    1         312324       14991552    [B
    2          78492        6279360    [Ljava.lang.Object;
    3         258826        4141216    java.lang.String
    4         161222        3869328    java.util.HashMap$Node
    5          43372        3469760    [Ljava.util.HashMap$Node;
    6          57369        2294760    java.util.HashMap
    7              1        2097168    [Lcom.ibm.ws.sib.msgstore.cache.links.AbstractItemLink;
    8          20159        2096536    java.lang.Class
    9          74902        1797648    java.util.ArrayList
   10          66804        1603296    java.util.concurrent.ConcurrentHashMap$Node
   11          14060        1237280    java.lang.reflect.Method
   12          22175        1064400    [C
   13          26850         859200    org.apache.derby.impl.store.raw.data.StoredRecordHeader
   14          23399         748768    [Ljava.lang.Class;
   15          29381         705144    org.apache.derby.impl.store.raw.data.RecordId
Total        2263229       84116600

Descriptions of the columns:

  • num: Simply the number of the row. Looking at the last number tells you how many classes are loaded.
  • object count: The number of objects of that class.
  • total size: The sum of shallow heap sizes of all objects of that class and the class itself.
  • class name: The Java class name (a [ prefix denotes an array)
Finally, the "Total" line provides the sum of the number of objects and the total heap usage (in the above example, 2,263,229 objects using 80.2MB).

The critical column is "total size" and its concept of the shallow heap size is described in more detail in the Eclipse MAT help. The short summary is that the shallow heap size is simply the size of the raw object itself plus any primitive fields.

Normally, we're more interested in the "retained heap size" which is basically the amount of Java heap that would be garbage collected if a particular object were to be garbage collected. Retained heap provides a good approximation for our intuitions about an object's "actual size". However, the "retained heap size" is not available in a class histogram dump. This is because MAT has to do some very complicated graph theory computation to calculate retained heap sizes (and this is the main reason why loading heapdumps and core dumps takes so long).

However, the sum of shallow heap sizes per class is still a very useful indicator of Java heap usage. Normally, it's safe to infer that the first few non-JDK classes (i.e. non-java.* classes) with large "total sizes" are likely to be what are driving the Java heap usage. In the above example, those would be the arrays of com.ibm.ws.sib.msgstore.cache.links.AbstractItemLink and the instances of org.apache.derby.impl.store.raw.data.*.


In summary, class histogram dumps are somewhere in between full heapdump analysis and object allocation sampling. They provide a relatively lightweight way to get a feeling of what is driving Java heap usage. However, they do still carry some non-trivial cost, so it's always best to test the overhead in a test environment first.