High Performance Computing

High Performance Computing Group

Connect with HPC subject matter experts and discuss how hybrid cloud HPC Solutions from IBM meet today's business needs.

View Only

Back to Blog List

Multidimensional resource scheduling performance: getting the work done faster

By Archive User posted Thu May 04, 2017 01:55 PM

Originally posted by: leyao

IBM Spectrum Symphony offers multidimensional resource scheduling, which increases resource utilization by enabling more granular resource allocations for applications with varying resource requirements. It works with EGO to allocate suitable resources to application services according to different requirements; each allocation can request different amounts of physical resource types, including, but not limited to CPU, cores, memory, and number of disks. Compared to slot-based scheduling, multidimensional scheduling offers more flexibility, and utilizes resources better. Multidimensional scheduling is powerful, efficient, and robust enough to handle many sessions in various workload patterns.

To illustrate multidimensional scheduling’s performance advantages over slot-based scheduling, we designed three test scenarios:

Scenario 1: Simple workload with optimized slot definition with sessions running in parallel
Scenario 2: Multiple concurrent workload with mixed resource consumption with sessions running in parallel
Scenario 3: Workload with different resource consumption submitted alternately

Our test scenarios illustrated that, in all cases, multidimensional scheduling provides better performance and higher resource utilization. This blog details the test results of these scenarios.

We executed all three scenarios with the following set-up:

Our application would run multidimensional scheduling sessions which included both CPU-intensive and memory-intensive tasks:
- CPU-intensive tasks run a complex calculation that consume many CPU cores. The more tasks running on a single host leads to more task runtime.
- Memory-intensive tasks runs real memory allocation, then sleep a while to hold, and finally release. Too many tasks running on a single host will lead the system to run out of resources and finally hang.
The test environment would include a powerful master host, plus over 30 powerful physical compute nodes (each is 16 core with 64 GB memory).

Scenario 1: Simple workload with optimized slot definition

The following table outlines the environment for test cases for scenario 1. Test case 1 used a slot-based scheduling slot definition of one slot per ncpus; we then increased the slot definition for test case 2:

Slot definition

Session and task

Slot-based scheduling

Test case 1:
1 slot = 1 ncpus

Test case 2:
(ncpus/1 + (maxmem/1000))/2

CPU-intensive session:

10000 tasks per session

10 second runtime for each task for a single run

Memory-intensive session:

15000 tasks per session
60 second dedicated runtime for each run

Workload:

1 CPU-intensive session and 1 memory-intensive session running in parallel

Multidimensional scheduling

CPU-intensive task:

1 ncpus

Memory-intensive task:

1000 MB memory

The results of test case 1 showed that multidimensional scheduling workload ran 6.7 times faster that our slot-based workload (that is, 1884 slot-based scheduling tasks divided by 282 multidimensional scheduling tasks is approximately 6.7):

(Note that in all graphs in this blog, “SBS” refers to slot-based scheduling, and “MDS” refers to multidimensional scheduling.)

This impressive performance result is because in multidimensional scheduling: each session’s task can find suitable system resources according to different slot definitions. For example, each compute host can support 16 CPU tasks plus 64 memory tasks running concurrently, but for slot-based scheduling, each compute host can only supply 16 slots (with one CPU per slot), and these slots were shared by two session tasks.

The following line graph shows about 5000 tasks running concurrently for multidimensional scheduling. In contrast, there are were only about 500 tasks for slot-based scheduling. Clearly, too many system resources were wasted:

In test case 2, although the slot-based scheduling’s slot configuration was for an average ncpus and memory, multidimensional scheduling continued to perform better because it always fully utilized system resources, and required less time to complete. See this bar graph (showing multidimensional scheduling required 282 seconds versus 504 seconds for slot-based scheduling, to run the same number of tasks):

Test 2 also showed that multidimensional scheduling could run more tasks concurrently (that is, 5000 tasks for multidimensional scheduling, versus 2500 tasks for slot-based scheduling):

Scenario 2: Multiple concurrent workload with mixed resource consumption

From the results of scenario 1, we validated that multidimensional scheduling is more advantageous in responding to workload with multiple resource requirements. Scenario 2 focuses on multidimensional scheduling with a high volume of concurrent workload with mixed resource consumption.

The following table outlines the environment for test cases for scenario 2, where we used the same pre-conditions as we did for scenario 1, but now submitted many (100 and 1000) sessions concurrently. All sessions contained CPU-intensive and memory-intensive tasks:

Slot definition

Session and task

Slot-based scheduling

SBS1: ncpus/1

SBS2: maxmem/1000

SBS3: (ncpus/1 + (maxmem/1000))/2

SBS4: 1ncpus & session2 serviceToSlotRatio=4:1

SBS5: maxmem/1000 & session1 serviceToSlotRatio=1:8

CPU-intensive session:

200 tasks per session

Memory-intensive session:

200 tasks per session

Workload for test case 1 (100 sessions):

20 CPU-intensive sessions and 80 memory-intensive sessions, running in parallel

Workload for test case 2 (1000 sessions):

800 CPU-intensive sessions and 200 memory-intensive sessions, running in parallel

Multidimensional scheduling

CPU-intensive task:

1 ncpus

Memory-intensive task:

1000 MB memory

In test case 1, we submitted 100 concurrent sessions: 20 were CPU-intensive sessions, while 80 were memory-intensive. The results were that workload runs faster when using multidimensional scheduling. This bar graph shows that it took 324 seconds to run all 100 concurrent sessions, versus 3690 seconds to run the same amount of sessions using slot-based scheduling:

This line graph from test case 1 shows almost 5000 multidimensional scheduling tasks were run, and less for slot-based scheduling, in the same amount of time:

For test case 2, we increased the number of concurrent sessions ten times, to 1000 sessions. The results were very similar to case 1: workload runs faster when using multidimensional scheduling, as illustrated in the these graphs:We know that multidimensional scheduling resource allocation needs be calculated for each session, and multiple concurrent sessions mean heavy VEMKD workload for resource counting and scheduling. Our test results from both cases in scenario 2, show that even with increased scheduling, multidimensional scheduling performance is not affected.

Scenario 3: Workload with different resource consumption submitted alternately

Our final test scenario focused on multidimensional scheduling’s performance under alternative workloads.

The following table outlines the environment for test cases for scenario 3, where, in contrast to scenario 2, we no longer submit sessions in parallel. Instead, we submitted the same number of CPU-intensive and memory-intensive sessions, but not concurrently. We submitted the work in alternate intervals:

Slot definition

Session and task

Slot-based scheduling

SBS1: ncpus/1

SBS2: maxmem/1000

SBS3: (ncpus/1 + (maxmem/1000))/2

SBS4: 1ncpus & session2 serviceToSlotRatio=4:1

SBS5: maxmem/1000 & session1 serviceToSlotRatio=1:8

CPU-intensive session:

15 tasks per session

Memory-intensive session:

15 tasks per session

Workload for test case 1:

20 CPU-intensive sessions and 80 memory-intensive sessions, run in 10 second intervals

Also ran 80 CPU-intensive sessions and 20 memory-intensive sessions.

Workload for test case 2:
800 CPU-intensive sessions and 200 memory-intensive sessions, run in 10 second intervals

Multidimensional scheduling

CPU-intensive task:

1 ncpus

Memory-intensive task:

1000 MB memory

Much like the tests where we ran concurrent sessions, multidimensional scheduling’s performance came out top when we ran sessions at different intervals. We further validated this by changing the workload pattern to 80% CPU-intensive sessions and 20% memory-intensive sessions, and the results were the same. Clearly, multidimensional scheduling could handle many running tasks with a small workload runtime:

Conclusions

All three test scenarios validated that multidimensional scheduling out- performed slot-based scheduling, regardless of the how simple or complex the workload, the number of tasks or sessions, the frequency or pattern of the submissions.

#SpectrumComputingGroup

0 comments

2 views

Permalink

https://community.ibm.com/community/user/blogs/archive-user/2020/05/28/multidimensional-resource-scheduling-performance-getting-the-work-done-faster

High Performance Computing

High Performance Computing Group

Multidimensional resource scheduling performance: getting the work done faster

By Archive User posted Thu May 04, 2017 01:55 PM

Permalink

Additional
Resources

Office

Quick Links

High Performance Computing

High Performance Computing Group

Multidimensional resource scheduling performance: getting the work done faster

By Archive User posted Thu May 04, 2017 01:55 PM

Permalink

Additional Resources

Office

Quick Links

Additional
Resources