This blog details the steps required to set up a rust project with MMA optimizations on IBM Power10 systems.
Prerequisites
This blog assumes the user already has conda installed. Utilize the following blog post by Sebastian Lehrig to get conda setup on power if needed.
Environment Setup
Create a new conda environment.
conda create --name your-env-name-here python=3.11
This will create a new environment and install python version 3.11 and its required dependencies.
Activate the newly created environment.
conda activate your-env-name-here
Once the environment is active, install the required packages.
conda install rust -c rocketce
conda install gcc gfortran -c conda-forge
When using the conda install command with the -c argument, packages will attempt be installed from a specified channel. Packages installed via the rocketce channel will have MMA optimizations.
Project Setup
Create a new rust project.
cargo new your-project-name-here
This will create a new directory with the provided project name. Navigate to the new project directory. Inside this directory will be a Cargo.toml
file as well a main.rs
file inside the src
directory.
main.rs
is the main script that will be run and contains a default “Hello World!” program
Cargo.toml
is the file in which project dependencies and libraries are set
Open Cargo.toml
and add the following lines under the [dependencies]
section.
blas = "0.22.0"
openblas-src = "0.10.9"
These lines add BLAS functionality to the rust project. They are external packages that can be found at the following links. blas, openblas-src. The Cargo.toml
file should look as follows.
[package]
name = "rust"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at <https://doc.rust-lang.org/cargo/reference/manifest.html>
[dependencies]
blas = "0.22.0"
openblas-src = "0.10.9"
Save the file and then open the main.rs
file. There will be boilerplate code that simply prints “Hello World!” to the console. Replace the existing code with the following.
extern crate openblas_src;
use blas::*;
fn main() {
let (m, n, k) = (2, 4, 3);
let a = vec![
1.0, 4.0,
2.0, 5.0,
3.0, 6.0,
];
let b = vec![
1.0, 5.0, 9.0,
2.0, 6.0, 10.0,
3.0, 7.0, 11.0,
4.0, 8.0, 12.0,
];
let mut c = vec![
2.0, 7.0,
6.0, 2.0,
0.0, 7.0,
4.0, 2.0,
];
unsafe {
dgemm(b'N', b'N', m, n, k, 1.0, &a, m, &b, k, 1.0, &mut c, m);
}
assert!(
c == vec![
40.0, 90.0,
50.0, 100.0,
50.0, 120.0,
60.0, 130.0,
]
);
}
This script contains sample code that carries out a simple matrix multiplication using the dgemm
function provided by BLAS. This script was created using the sample given on the BLAS Crate page.
The final step before the project can be built and run is to make sure that the rust build tool has all necessary directories to link required libraries.
Create a new directory inside the base project directory called .cargo
.
Create a new file inside this directory called config.toml
.
Open config.toml
and paste the following lines.
[build]
rustflags = "-L /home/<username>/anaconda3/envs/<envname>/lib"
The goal of this operation is to provide the rust build tool with the directory in which the conda environment stores all of its installed libraries. Therefore, the exact path may differ system to system but the path shown is the default installation path for Anaconda on IBM Power10 systems. Be sure to replace <username>
and <envname>
with the appropriate names.
With this completed, return to the base project directory.
Execution
The project can be run with the cargo run
command. On first run the project will be built and run so expect slightly longer execution time on the first run.
To check MMA usage run the project with the following command.
perf stat -e r1000E cargo run
This command outputs the number of MMA events that occur during execution. Sample output is as follows.
As seen in the above output, there were 11 MMA events that occurred during the execution of this project. This number will change based on the complexity and number of operations being carried out.
Conclusion
This blog detailed the steps required to set up a rust project with MMA optimizations on IBM Power10 systems. A basic matrix multiplication script was created and MMA utilization was confirmed. This script acts as a starting point for the use of optimized matrix operations on IBM Power10 systems and further improvements for more specific use cases can be made.