Data and AI on Power

IBM Power systems provide a robust and scalable platform for a wide range of data and AI workloads, offering benefits in performance, security, and ease of use.

View Only

Back to discussions

Expand all | Collapse all

Question about MMA, GPU, CPU and types of LLMs

Kamil PThu July 04, 2024 02:10 PM

Hello, As part of our AI learning, we ran Red hat (ppc64le) on one of our Power10 machine. We are ...

Sebastian LehrigFri July 12, 2024 05:58 AM

Hi Kamil, 1) Power10 provides acceleration for AI workloads directly on each Power10 chip, including ...

1. Question about MMA, GPU, CPU and types of LLMs

Like
Kamil P
Posted Thu July 04, 2024 02:10 PM

Reply
Hello,
As part of our AI learning, we ran Red hat (ppc64le) on one of our Power10 machine. We are at the beginning of our journey with AI and it is difficult for us to understand some things, so a few questions come to my mind:
1) How does MMA in Power10 processors compare to technologies such as GPU, TPU? What does it look like in terms of performance? (e.g. I see that a Macbook with an m1 processor generates text faster)
2) We downloaded two models 8B-SPPO-Iter3-Q8_0.gguf and 8B-SPPO-Iter3-Q6_K.gguf and 8B-SPPO-Iter3-Q8_0.gguf is clearly faster even though it is larger(in theory it should be slower), why is this so? Should we choose a specific type of LLM from huggingface.co for MMA technology?

We use llama.cpp from this manual https://community.ibm.com/community/user/powerdeveloper/blogs/vaibhav-shandilya/2024/05/07/prepare-ibm-power10-for-inferencing-with-llms

------------------------------
Kamil
------------------------------
2. RE: Question about MMA, GPU, CPU and types of LLMs

Like
Sebastian Lehrig
Posted Fri July 12, 2024 05:58 AM

Reply
Hi Kamil,

1) Power10 provides acceleration for AI workloads directly on each Power10 chip, including capabilities such as MMA, SIMD units, and a high memory bandwidth between system memory and Power10 chip. All of that, not only MMA, improves performance of AI workloads such as inferencing of LLMs. Given that this directly works with the CPU, you can leverage system memory (so you're not restricted to GPU memory) and you don't need to mess around with CUDA. For example, compare https://huggingface.co/google/flan-t5-base#running-the-model-on-a-cpu with https://huggingface.co/google/flan-t5-base#running-the-model-on-a-gpu: running with CPU even leads to easier to understand/maintain code vs. GPU-aware code.

2) In terms of performance, you need to configure your Power10 server appropriately; then you can easily handle LLMs with billions of parameters; the 8B models you are referencing shouldn't be a problem then: https://community.ibm.com/community/user/powerdeveloper/blogs/sebastian-lehrig/2024/03/26/sizing-for-ai

We have optimized for INT8 quantization in combination with SIMD/MMA instructions. INT6 probably is not as performant for that reason.

------------------------------
Sebastian Lehrig
------------------------------

Original Message

Data and AI on Power

Data and AI on Power

Question about MMA, GPU, CPU and types of LLMs

Kamil PThu July 04, 2024 02:10 PM

Sebastian LehrigFri July 12, 2024 05:58 AM

1. Question about MMA, GPU, CPU and types of LLMs

2. RE: Question about MMA, GPU, CPU and types of LLMs

Additional
Resources

Office

Quick Links

Data and AI on Power

Data and AI on Power

Question about MMA, GPU, CPU and types of LLMs

Kamil PThu July 04, 2024 02:10 PM

Sebastian LehrigFri July 12, 2024 05:58 AM

1. Question about MMA, GPU, CPU and types of LLMs

2. RE: Question about MMA, GPU, CPU and types of LLMs

Additional Resources

Office

Quick Links

Additional
Resources