Power

 View Only

 nmem64 example to utilize up to 36TB memory

Lukas Schmid's profile image
Lukas Schmid posted Thu November 21, 2024 10:14 AM

Hi, im looking for an example to utilize a big amount of memory wthin an AIX LPAR. 

I know nmem64 from Nigel, but I struggle a bit with addressing 2GB emory junks.  I also know I need to run multiple nmem64 to address a huge amount of Memory. 
But this first simple process isn't working. 


see here: # ./nmem64 -m 2000 -s 120
malloc: Not enough space 

Any idea whats wrong?

thanks for a short feedback, Lukas  

Yves-Sandro Foltys's profile image
Yves-Sandro Foltys

Hi Lukas,
I don't have an alternative to nmem64, but the error message looks like hitting a ulimit to me. 
Maybe setting "ulimit -d unlimited" helps.

Regards,

Yves

nigel griffiths's profile image
nigel griffiths

First 36 TB of memory is extremely large. You have a serious server.

I looked up the Power10 memory bandwidth and found:

The IBM Power10 processor has a maximum memory bandwidth of 409 GB/sec per socket for 32 GB and 64 GB memory cards, and 375 GB/sec per socket for 128 GB and 256 GB memory cards. The Power10's memory bandwidth is 2.6 times higher than scalable x86 processors.

With with 3 TB of memory per Power10 socket with up to 15 CPUs.

I assume you are trying some sort of "burn in" test to check the memory is fully working - IBM will have already done that but I understand the need.

For 36TB, you have at least 9 sockets with (guessing at least) 120 to the full 240 CPUs.

You will have to use all of the CPU to test memory or you will be waiting days.

As nmem64 is single threaded, a starting point would be one nmem64 process per CPU and up to eight nmem64 processes per CPU. Due to SMT=8.

There is no getting around this. So you are into hundreds of copies of nmem64.

At 36 TB and say 100 processes you will need 36 x 1024 / 100 = 368 GB per process.

Testing this much memory is not a trivial task.

I have never has such a huge machine to "play" with - please let me know how you get on.

Ask if you get stuck, I will try to help out.

With 100 to 800 processes, I would guess you have to send the output to log files and then run checks on the log files to fine out if it is working as expected.

Even starting 800 processes, getting the kernel to allocate the space and then write to each memory page to force the allocation of real memory could take a few hours!

Good luck, cheers, Nigel