IBM Z and LinuxONE - IBM Z

IBM Z

The enterprise platform for mission-critical applications brings next-level data privacy, security, and resiliency to your hybrid multicloud.

 View Only

Performance Analysis and Profiling in .NET on IBM Z

By Sanjam Panda posted Tue October 31, 2023 07:41 AM

  

.NET is a versatile and powerful framework  for building a wide range of scalable applications. It offers a rich set of libraries and tools for web, desktop, mobile and cloud development.
Performance analysis is crucial for .NET to ensure efficient resource utilization and optimal responsiveness, resulting in better user and efficient user applications. In the following blog we will
see how we can get started with performance analysis on Z.

Pre-requisites:- 

  • Perf version 5.4+
  • The latest dotnet SDK

1. Create a new dotnet project

dotnet new console -n Hello_World && cd Hello_World


2. Add BenchmarkDotNet nuget package to our csproj file

dotnet add package BenchmarkDotNet


3. Let's write a dotnet program

using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Engines;
using BenchmarkDotNet.Running;

public class Benchmark_Example
{
        int [] arr = new int [100];

        [Benchmark]
        public void run()
        {
                for (int i = 0; i < 100 ; i++)
                {
                        arr[i] = i * 2;
                }
        }
}

class Hello_World
{
        public static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Hello_World).Assembly).Run(args);
}

BenchmarkDotNet is a .NET benchmarking library which allows you to measure the execution time and provides various statistical analysis which  helps you in identifying various bottlenecks in your dotnet application.


4. Build the application in Release mode

dotnet build -c Release

BenchmarkDotNet will only works with applications in release mode,  because Debug mode has zero optmizations enabled.

5. Use perf to record measurements of the dotnet applications

MONO_ENV_OPTIONS="--jitdump" perf record -k 1 dotnet bin/Release/net8.0/Hello_World.dll

The jitdump basically creates a dump file where all the diagnostic data related to the JIT is present. After the execution of the application we can see that the BenchmarkDotNet has provided measurement related data on the benchmark code


6. Perf Inject

perf inject --jit -i perf.data -o perf.jit.data

We inject the dumps into the jitted synthetic .so's in order for perf to be able to read it.

7. Perf report

perf report -n -i perf.jit.data

 

we can see in the profiler that our benchmarked function Run() does show some overhead , now let's annotate the Run() by pressing enter

we can see the assembly instructions and the hotspot of those instructions. It is difficult to directly understand the assembly instructions and relevant source code for the instructions could help us to understand the scope of optimizing the code. This support was recently added for dotnet by the compiler team. on pressing 's' we should be able to see the source code lines for the corresponding instructions and on pressing 'l' on the source code should provide you the path and the line number of your source code file.

we can see that the 'arr[i] = i*2' has some heavy instruction hotspots. one way we could optimize the code is to manually unroll the loop. this optimization technique reduces overhead on the loop control structures.

using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Engines;
using BenchmarkDotNet.Running;

public class Benchmark_Example
{
        int [] arr = new int [100];

        [Benchmark]
        public void run()
        {
                for (int i = 0; i < 100 ; i += 5)
                {
                        arr[i] = i * 2;
                        arr[i + 1] = (i + 1) * 2;
                        arr[i + 2] = (i + 2) * 2;
                        arr[i + 3] = (i + 3) * 2;
                        arr[i + 4] = (i + 4) * 2;

                }
        }
}
class Hello_World
{
        public static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Hello_World).Assembly).Run(args);
}

on building and running the dotnet application again we can see that we have reduced our mean measurement by ~22 ns.

on annotating our modified benchmark code we can see that the hotspots on the instructions have been reduced.

This optimization technique should be used carefully. Extensively using this optimization technique may reduce the source code maintainability and increase the assembly code size heavily and we should always keep the time to code size trade-off minimal as possible. further we could experiment more with different optimization technique to see which technique fits the best for our use-case.
0 comments
16 views

Permalink