Originally posted by: Holgervk
Hello,
on about 60 systems (proably all that have hacmp running) I get entries in errpt like these:
3D32B80D 16-02-09 00:05 P S topsvcs NIM thread blocked
Details in errlog tell that those nim-threads (one per heartbeat) have been blocked for a certain amount of time, can be 5 seconds, can be 50.
When I look at performance-logging tools (like patrol or even simple vmstat commands that were running) I see that those commands have been blocked for approximately the same amount of time.
So, something on some of my nodes prevents tasks to be executed.
The nodes vary from 64 cpu p595 to partitions with 0.5 cpu.
The errors come without and regularity. One night 5 come. Then its quiet for days or weeks.
Does anybody have an idea or at least a similar situation?
Regs, Holger