Originally posted by: SystemAdmin
I've spent a fair amount of time on AIX, but I have never seen this one. I have an LPAR, AIX 5.2.10 on a 9119-595, that went to almost 100% IO wait between 0230 and 0245 a little over 3 weeks ago. This is an IHS server that does very little disk IO.
iostat 1 5
System configuration: lcpu=1 disk=12
tty: tin tout avg-cpu: % user % sys % idle % iowait
0.4 18.4 3.0 3.0 54.2 39.8
Disks: % tm_act Kbps tps Kb_read Kb_wrtn
hdisk5 0.0 0.0 0.0 10368 51572
hdisk3 0.0 0.0 0.0 61216 12204
hdisk0 0.0 0.0 0.0 0 0
hdisk1 0.0 0.0 0.0 11157 79242
hdisk2 0.0 0.0 0.0 6320 2384
hdisk11 0.0 0.0 0.0 0 0
hdisk12 0.0 0.0 0.0 0 0
hdisk4 0.0 0.0 0.0 40432 14784
hdisk6 0.0 0.0 0.0 43192 41340
hdisk10 0.0 0.0 0.0 0 1580
hdisk8 0.0 0.0 0.0 0 1056
hdisk7 0.0 0.0 0.0 88 184660
tty: tin tout avg-cpu: % user % sys % idle % iowait
1.0 1769.0 4.0 4.0 0.0 92.0
Disks: % tm_act Kbps tps Kb_read Kb_wrtn
hdisk5 1.0 0.0 0.0 0 0
hdisk3 1.0 0.0 0.0 0 0
hdisk0 0.0 0.0 0.0 0 0
hdisk1 0.0 0.0 0.0 0 0
hdisk2 1.0 0.0 0.0 0 0
hdisk11 0.0 0.0 0.0 0 0
hdisk12 0.0 0.0 0.0 0 0
hdisk4 1.0 0.0 0.0 0 0
hdisk6 1.0 0.0 0.0 0 0
hdisk10 1.0 0.0 0.0 0 0
hdisk8 1.0 0.0 0.0 0 0
hdisk7 1.0 0.0 0.0 0 0
tty: tin tout avg-cpu: % user % sys % idle % iowait
0.0 984.0 1.0 6.0 0.0 93.0
Disks: % tm_act Kbps tps Kb_read Kb_wrtn
hdisk5 0.0 0.0 0.0 0 0
hdisk3 0.0 0.0 0.0 0 0
hdisk0 0.0 0.0 0.0 0 0
hdisk1 0.0 0.0 0.0 0 0
hdisk2 0.0 0.0 0.0 0 0
hdisk11 0.0 0.0 0.0 0 0
hdisk12 0.0 0.0 0.0 0 0
hdisk4 0.0 0.0 0.0 0 0
hdisk6 0.0 0.0 0.0 0 0
hdisk10 0.0 0.0 0.0 0 0
hdisk8 0.0 0.0 0.0 0 0
hdisk7 0.0 0.0 0.0 0 0
tty: tin tout avg-cpu: % user % sys % idle % iowait
0.0 1701.0 2.0 2.0 0.0 96.0
Disks: % tm_act Kbps tps Kb_read Kb_wrtn
hdisk5 0.0 0.0 0.0 0 0
hdisk3 0.0 0.0 0.0 0 0
hdisk0 0.0 0.0 0.0 0 0
hdisk1 0.0 0.0 0.0 0 0
hdisk2 0.0 0.0 0.0 0 0
hdisk11 0.0 0.0 0.0 0 0
hdisk12 0.0 0.0 0.0 0 0
hdisk4 0.0 0.0 0.0 0 0
hdisk6 0.0 0.0 0.0 0 0
hdisk10 0.0 0.0 0.0 0 0
hdisk8 0.0 0.0 0.0 0 0
hdisk7 0.0 0.0 0.0 0 0
tty: tin tout avg-cpu: % user % sys % idle % iowait
0.0 984.0 6.0 10.0 0.0 84.0
Disks: % tm_act Kbps tps Kb_read Kb_wrtn
hdisk5 0.0 0.0 0.0 0 0
hdisk3 0.0 0.0 0.0 0 0
hdisk0 0.0 0.0 0.0 0 0
hdisk1 0.0 0.0 0.0 0 0
hdisk2 0.0 0.0 0.0 0 0
hdisk11 0.0 0.0 0.0 0 0
hdisk12 0.0 0.0 0.0 0 0
hdisk4 0.0 0.0 0.0 0 0
hdisk6 0.0 0.0 0.0 0 0
hdisk10 0.0 0.0 0.0 0 0
hdisk8 0.0 0.0 0.0 0 0
hdisk7 0.0 116.0 2.0 0 116
So, why is the system flagging what is probably idle time as IO wait?
Is a reboot the only way to make this host come to its senses?
There are no processes that jump out, and no processes that were started during the 15 minute window when the wait time went haywire.
Attached is a ganglia image of the server going crazy.
Scott
#AIX-Forum