Introduction
To be perfectly honest, it was only last week when I was looking into how to reduce GC pauses that I stumbled upon Shenandoah and its younger cousin ZGC (covered elsewhere). For me, the reason was pretty clear: up until now I had never been in a situation where G1 failed to deliver. Combine that with the fact that, although it was introduced in Java 12, Shenandoah wasn't considered production-ready until Java 17. So, I can understand why, especially with G1 in self-tuning mode as the default GC, Shenandoah seems, to me at least, to sort of slipped in unannounced.
The Shenandoah Garbage Collector is a low-pause, concurrent garbage collector that focuses on reducing pause times by performing heap compaction concurrently with application threads. Shenandoah is designed for applications that require predictable, low-latency behavior, and it achieves this by avoiding stop-the-world (STW) pauses during heap compaction, which is traditionally a major source of latency in garbage collection. Naturally, there is a tradeoff as Shenandoah trades a bit of memory overhead to still enable object accessibility during heap compaction.
Let's take a closer look.
Key Characteristics of Shenandoah GC
-
Concurrent Compaction:
- Shenandoah performs heap compaction concurrently with application threads, unlike traditional collectors that pause the application during compaction.
- This is achieved using forwarding pointers and barriers to redirect object references to their new locations.
-
Region-Based Heap Layout:
- The heap is divided into regions, similar to G1 GC.
- Regions are classified as:
- Young Regions: Contain newly allocated objects.
- Old Regions: Contain long-lived objects.
- Humongous Regions: Contain large objects that span multiple regions.
-
No Generational Model:
- Shenandoah treats the heap as a single space without dividing it into young and old generations.
- This simplifies the implementation and avoids generational GC complexities.
-
Pause Time Goals:
- Shenandoah is optimized for low-pause workloads, with pauses typically lasting less than 10 milliseconds, regardless of heap size.
Technical Implementation
1. Heap Layout
- The heap is divided into regions, typically sized between 256 KB to 32 MB, depending on the heap size.
- Each region can be independently collected, compacted, or reused, reducing contention and fragmentation.
2. Concurrent Phases of Shenandoah GC
Shenandoah operates in several phases, most of which run concurrently with application threads:
-
Initial Mark:
- Identifies the root set of live objects (e.g., thread stacks, static fields).
- This phase is a stop-the-world (STW) pause but is very short.
-
Concurrent Mark:
- Traverses the object graph starting from the root set to identify all reachable (live) objects.
- Runs concurrently with application threads.
-
Final Mark:
- Completes any missed marking tasks from the concurrent phase.
- This phase is a short STW pause.
-
Concurrent Cleanup:
- Identifies regions that are mostly garbage and schedules them for collection.
- Runs concurrently with the application.
-
Concurrent Evacuation:
- Moves live objects out of regions that are being reclaimed.
- Uses forwarding pointers to redirect object references to their new memory locations.
-
Update References:
- Updates all references to evacuated objects, ensuring the application interacts only with valid memory locations.
- This phase uses read and write barriers to handle reference updates concurrently.
3. Forwarding Pointers
- When an object is evacuated to a new memory location, Shenandoah uses a forwarding pointer to redirect references from the old location to the new one.
- Forwarding pointers are stored directly in the object's header.
- This mechanism ensures that the application can continue accessing objects while compaction is in progress.
4. Barriers
Shenandoah relies heavily on barriers to manage references and ensure the correctness of concurrent operations.
-
Read Barrier:
- Intercepts memory reads and checks whether the object being accessed has been evacuated.
- If the object has been moved, the read barrier resolves the reference to its new location.
-
Write Barrier:
- Intercepts memory writes and ensures that references to evacuated objects are updated correctly.
5. Thread-Local Allocation Buffers (TLABs)
- Shenandoah uses Thread-Local Allocation Buffers (TLABs) to optimize object allocation.
- Each thread allocates objects in its own buffer, reducing contention and improving performance.
Performance Features
-
Concurrent Compaction:
- Shenandoah performs heap compaction concurrently with application threads, eliminating long STW pauses.
-
Predictable Pause Times:
- Pauses are generally kept to single-digit milliseconds, regardless of heap size.
-
Region-Based Design:
- The region-based heap layout allows Shenandoah to manage memory efficiently and reduce fragmentation.
-
Automatic Tuning:
- Shenandoah dynamically adjusts its behavior based on application workloads and memory conditions.
Common Parameters for Shenandoah
-XX:+UseShenandoahGC
: Enables the Shenandoah garbage collector.
-XX:ShenandoahHeapRegionSize=<size>
: Sets the size of heap regions.
-XX:PauseTimeTarget=<time>
: Specifies the desired maximum pause time.
-XX:+ShenandoahUncommit
: Allows Shenandoah to release unused memory back to the operating system.
Advantages of Shenandoah
- Low Pause Times:
- Shenandoah minimizes pause times, making it ideal for latency-sensitive applications.
- Concurrent Compaction:
- Compacts memory while the application runs, reducing fragmentation without disrupting execution.
- Scalability:
- Performs well with large heaps and multi-core systems.
- Ease of Use:
- Requires minimal configuration and tuning.
Limitations of Shenandoah
- Throughput Trade-Off:
- Shenandoah prioritizes low latency, which may lead to slightly lower throughput compared to collectors like Parallel GC.
- Memory Overhead:
- Shenandoah's additional overhead from storing/accessing metadata and barriers limits its ability to scale well beyond ~500GB heaps.
- Not Ideal for Small Heaps:
- Shenandoah's concurrent design introduces overhead that may be unnecessary for applications with small heaps and cause CPU contention under very high object allocation rates.
Use Cases
- Latency-Sensitive Applications:
- Ideal for workloads requiring predictable low pause times, such as trading systems, real-time analytics, and gaming.
- Large Heap Applications:
- Performs well for applications with large heaps, such as big data processing and in-memory databases.
- Concurrent Workloads:
- Suitable for systems with high concurrency and multi-core architectures.
Shenandoah offers a compelling choice for applications where low pause times are critical, and its concurrent design allows it to scale efficiently with large heaps and modern hardware. While it provides a robust GC solution for most low-latency applications, as heap size increases, so does the associated memory overhead, making less suitable for extremely large (>500GB) heaps. Next, we will look at the latest player on the field, ZGC.
Until next time, happy integrating!