WebSphere Application Server & Liberty

 View Only

Tuning Cloud-Native Java Applications for Performance: Scaling Up Versus Scaling Out

By Joshua Dettinger posted Tue May 17, 2022 09:18 AM

  

Tuning Cloud-Native Java Applications for Performance: Scaling Up Versus Scaling Out

 

Introduction

All application server environments have one thing in common: limited resources. How best to use those resources will determine the efficiency in processing user requests. There are two general options available to increase the overall throughput of your environment. We can scale up by adding resources to one JVM or we can scale out by adding identical JVMs to serve the client load. This blog will present an experiment to show how throughput can improve through adding resources to the application tier, and how much improvement you may be able to expect by scaling up or scaling out. This blog post is a follow up to my previous blog with a slightly different experiment. Read about it here.

 

Experiment

Will one JVM using all of the cores on the system perform best? Maybe several JVMs, each assigned to a single core? How does memory allocation affect throughput? The simplest ways are to add more memory to the heap and allocate more CPUs to the JVM. There are other parameters that can be played with (connections, threads, etc.), but we will focus on memory and CPU allocation.

 

Environment

These experiments were run on a 12 core system. I used taskset -c to allocate CPUs to the Liberty JVMs. The application used to test was a very simple ping-type application involving a single REST endpoint. I used the IBM Semeru Runtimes for Java 8 for my JVMs. Heap size in the scale up case was set so the total heap is 128MB per core. For the client load, I used 12 processes, each running 4 threads, for a total of 48 threads hitting the REST endpoint across the varying number of JVMs in the experiment.

 

Results

All results are server-side measurements. The client-side remained unchanged. Throughput is in requests/second, and memory is the actual footprint held in RAM, also known as resident set size (rss), in megabytes.

 

Here are the scale up results (1 JVM):

Number of cores

Max Heap

Aggregate throughput

Aggregate memory footprint

1

128

35057

96

2

256

62081

99

3

384

86703

102

4

512

116876

106

6

768

186342

105

8

1024

160658

115

12

1536

167036

129

 

 

Here are the scale out results. Each JVM is allocated one core and 128MB for heap:

Number of JVMs

Aggregate throughput

Aggregate memory footprint

1

35057

96

2

74981

189

3

94495

289

4

124223

382

6

175437

575

8

244600

771

12

320358

1141

 

 

This first chart shows throughput as a function of the number of cores. Keep in mind in the scale up case, the number of cores are all allocated to 1 JVM, and in the scale out case, the cores are allocated each to its own JVM. We can see the lines trend very similarly until after 6 cores. After this point, the lines diverge widely. We are unable to get any more throughput by adding more cores and heap to the JVM in the scale up case.

 

 

The second chart shows the increase in memory footprint is linear as a function of the number of JVMs in the scale out case. It is logical that adding a new JVM increases the overall memory footprint on our system, as each freshly-started JVM will have the same amount of overhead. Memory footprint increases very little in the scale up case.

 

Conclusion

What did we learn? In both the scale up and scale out cases, we started with 1 JVM hosting our simple application. In the scale up case, we varied the amount of cores and heap size apportioned to the JVM. In the scale out case, we varied the number of identical JVMs hosting the application. Throughput results were very similar up to the point we hit 6 cores or JVMs. Thereafter the scale out case shows continued throughput increases. Those increases come at a cost as shown in the memory footprint results.  Memory footprint usage was stable throughout the experiment for the scale up case, while the scale out case shows a linear increase.

 

With one JVM running, all of the load flows through that one instance. This leads to more threads being created, causing a greater likelihood of lock bottlenecking. There are other possible points of contention when all of the load flows to one JVM, for example the application itself, the application server, or the JVM. Our simple case shows that these bottlenecks don’t show up until after we have 6 cores apportioned to our JVM. After this point, throughput actually trends slightly downward. We are not going to explore the cause of these specific bottlenecks, but we can see they exist and affect throughput. The point at which the scaling slows down has a lot to do with application logic, so for some other application, the scaling could be affected at a different number of cores or memory used.

 

As we grow past 6 cores, we see the scale out case excels in throughput. We know we have basic Java structures being duplicated in the separate heaps like Strings, System, framework initialization classes, and Liberty infrastructure overhead. This is why the increased throughput comes at a cost of higher memory footprint. Multiple JVMs also will take longer to warm up multiple JITs, causing the system to take longer to ramp up to full performance. This can be mitigated with JITServer which you can learn more about here. If you extrapolate this experiment out to a more complicated application, or in an environment that is more memory-constrained, those factors may lead to different results.

 

Recall my previous experiment. In that case, each experiment used the same number of cores and memory. In these new experiments, I varied the number cores and memory used. The limitations of your environment will dictate how you experiment with your applications, but the concepts outlined in both blogs should be useful in gaining throughput.

 

All applications and environments are not created equal, so you may see different results. There exists a tradeoff between scaling up and scaling out that you should be aware of as a user, so choose according to your own application deployment. Experimenting along these lines would allow you to choose the right footprint versus throughput tradeoff for your Liberty applications.
#performance,
#performance
#WebSphere-performance
#java-performance
0 comments
17 views

Permalink