Introduction
As 64-bit computing became the norm, bigger demands were heaped (so to speak) on Java applications, such as the WebMethods Integration Server. Suddenly, we were expected to scale the software vertically, taking advantage of available RAM and CPUs, to process mainframe-sized workloads without flinching. The Serial GC was essentially useless for maintaining low latency and the Parallel GC had trouble keeping up as the Java heap grew. Back in those days, the Concurrent Mark-Sweep (CMS) GC was my go-to for dealing with any performance issues related to GC. It might take some time, but CMS was highly tunable and could readily handle a wide variety of workloads, regardless of heap size. However, CMS had limitations. Due to its generational nature, CMS was poorly suited to unpredictable or mixed load types. In other words, if the ratio of young versus old objects stored in memory was very dynamic, CMS would bog down. You could size the generations, but the sizes were fixed, whether as a percentage of heap or a hard number. Then, you could adapt CMS to handle either scenario, but not both. What we needed was a GC that could adapt to dynamic workloads on very large heaps with at least predictable pause times. Enter G1.
The G1 (Garbage-First) garbage collector is a region-based, low-latency garbage collector introduced in Java 7 and designed to replace the CMS collector. It scales to handle very large heaps by dividing the heap into separately collected regions and can dynamically reclassify young and old regions to handle changing workloads. In addition, G1 achieves predictable pause times through incremental compaction and concurrent collection. Let's take a little closer look at G1's technical implementation:
Key Concepts of G1
-
Region-Based Heap Layout:
- The heap is divided into fixed-size regions (e.g., 1 MB to 32 MB, depending on heap size).
- These regions are categorized into:
- Young Regions: Newly allocated objects.
- Old Regions: Long-lived objects.
- Humongous Regions: Large objects exceeding 50% of the region size are stored here.
-
Generational Collection:
- G1 logically divides the heap into young and old generations but uses regions to implement this division. Objects are promoted from young to old regions as they survive garbage collection cycles.
-
Garbage-First Approach:
- G1 prioritizes regions with the most garbage to reclaim memory efficiently. This is determined by a cost-benefit analysis that estimates how much memory can be reclaimed and how long the collection will take.
-
Pause Time Goals:
- G1 allows users to set a target pause time (e.g.,
-XX:MaxGCPauseMillis
). The collector works to stay within this time by dynamically adjusting the number of regions collected in each cycle.
Key Phases of G1 Garbage Collection
-
Young Generation Collection:
- Trigger: Initiated when the young generation's allocated regions are full.
- Process:
- Collects all objects in the young regions.
- Surviving objects are promoted to old regions.
- Uses parallel threads to improve throughput.
- Pause Time: Typically short, as it focuses only on young regions.
-
Concurrent Marking Cycle:
- Trigger: Periodically triggered as the heap fills.
- Phases:
- Initial Mark:
- Marks objects reachable from the root set (e.g., threads, global references).
- Runs as a stop-the-world (STW) pause but is very short.
- Root Region Scanning:
- Identifies references from the young generation to old regions.
- Runs concurrently with application threads.
- Concurrent Marking:
- Traverses the object graph to identify live objects across the heap.
- Runs concurrently with the application.
- Remark:
- Completes marking of objects missed during concurrent marking.
- Runs as a short STW pause.
- Cleanup:
- Identifies regions with the most garbage (based on live object density).
- Runs partially concurrently and partially as an STW pause.
-
Mixed Collection:
- Trigger: After the concurrent marking phase.
- Process:
- Collects both young and selected old regions with significant garbage.
- Balances garbage collection effort against the user-defined pause time goal.
-
Humongous Object Handling:
- Large objects (e.g., arrays) are stored in humongous regions, which consist of contiguous region blocks.
- These are handled separately to avoid fragmentation.
Internal Data Structures in G1
-
Region Table:
- Tracks metadata for each region, including its type (young, old, or humongous), occupancy, and garbage ratio.
-
RSet (Remembered Set):
- Tracks references from one region to another, enabling efficient garbage collection without scanning the entire heap.
-
Card Table:
- Divides regions into small "cards" (~512 bytes) and tracks updates to object references for efficient incremental updates during GC.
Performance Features
-
Predictable Pause Times:
- G1 uses heuristics to limit the number of regions collected in a single cycle, adhering to user-defined pause time goals.
- Regions with the most garbage are prioritized for collection.
-
Concurrent Execution:
- Most phases of G1 (e.g., marking, root scanning) run concurrently with application threads, reducing pause times.
-
Parallelism:
- G1 utilizes multiple GC threads for parallel processing, scaling well with modern multi-core CPUs.
-
Automatic Tuning:
- G1 dynamically adjusts the size and number of regions collected in each GC cycle to optimize performance based on application behavior.
Advantages of G1
- Low Latency:
- Designed to minimize long pauses, making it ideal for applications requiring predictable response times.
- Efficient Compaction:
- Incremental compaction during mixed collections reduces memory fragmentation.
- Scalability:
- Performs well with large heaps and multi-core systems.
Limitations of G1
- Throughput Trade-Off:
- While G1 reduces pause times, its overall throughput may be lower than traditional collectors like Parallel GC.
- Complex Tuning:
- Advanced applications may require fine-tuning of parameters for optimal performance.
- Humongous Object Handling:
- Large object allocation can lead to fragmentation, requiring careful monitoring.
- Excessive Pause Times:
- While G1 will do its best to honor the configured max GC pause time, pause times may exceed the configured limit under high load or when there is little available heap space.
Common Tuning Parameters for G1
-XX:MaxGCPauseMillis=<time>
: Sets the desired maximum pause time.
-XX:InitiatingHeapOccupancyPercent=<percent>
: Specifies the heap occupancy threshold to trigger concurrent marking.
-XX:ParallelGCThreads=<number>
: Defines the number of parallel GC threads.
-XX:G1HeapRegionSize=<size>
: Sets the size of heap regions.
Use Cases
- Applications requiring good throughput with short predictable pause times (e.g., financial systems, real-time applications).
- Systems with large heaps, as G1 scales efficiently with heap size.
- Applications with mixed workloads that benefit from incremental compaction.
Summary
G1 has become the default GC for Open JDK and pretty much all Hotspot-based JVMs for good reason. It self-tunes well to handle both fairly stable and highly dynamic loads that run on very large heaps. At the same time, it has certain shortcomings that make it unsuitable when consistent extremely low latency is required. Unfortunately, G1 cannot be relied upon to 100% adhere to latency requirements and may behave unpredictably when pushed to its limits.
Until next time, happy integrating!