AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.

 View Only
  • 1.  NIM thread blocked

    Posted Tue February 24, 2009 09:08 AM

    Originally posted by: Holgervk


    Hello,

    on about 60 systems (proably all that have hacmp running) I get entries in errpt like these:

    3D32B80D 16-02-09 00:05 P S topsvcs NIM thread blocked

    Details in errlog tell that those nim-threads (one per heartbeat) have been blocked for a certain amount of time, can be 5 seconds, can be 50.

    When I look at performance-logging tools (like patrol or even simple vmstat commands that were running) I see that those commands have been blocked for approximately the same amount of time.

    So, something on some of my nodes prevents tasks to be executed.
    The nodes vary from 64 cpu p595 to partitions with 0.5 cpu.

    The errors come without and regularity. One night 5 come. Then its quiet for days or weeks.

    Does anybody have an idea or at least a similar situation?

    Regs, Holger


  • 2.  Re: NIM thread blocked

    Posted Tue February 24, 2009 06:41 PM

    Originally posted by: grukrz1


    It happens on heavy loaded systems...

    check "lssrc -ls topsvcs" - you will probably see many "Missed HBs:" on some hbeat devices stats.

    probably you need to increase "Interval between Heartbeats" for HACMP specific "Network Modules" which are offten "affected" by heavy load. As far as I remember when I increased the interval for 'diskhb' module by 1 second, "Missed HBs:" shows 0 or a little now. The change required HACMP restart to get new interval active.

    Btw. I also increased intervals for 'ether' module but probably you don't need it...
    regards,
    K.


  • 3.  Re: NIM thread blocked

    Posted Wed August 18, 2010 04:24 PM

    Originally posted by: SystemAdmin


    Hi

    We have the same type of problem on our clusters ....

    The config :

    • p595 64 cpu
    • vio 2.1
    • EMC SAN
    • All lpars are virtualized for network and disk access

    Every day we are getting a lot of "nim thread blocked " messages in our errpt. This happen on all types of hacmp configs :

    • loaded ( cpu, mem, io)
    • not loaded
    • with dedicated disk for heartbeat.

    The funny thing is that, the messages are appearing roughly at the same time on all the LPARs !! ( dicovered today). It look like that this has something to do with the VIO and or the SAN ?

    PS: we have a TSM server using dedicated adapters + Powerpath : No problem

    Any experience ?