AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.


#Power
 View Only
Expand all | Collapse all

hdisk0 and hdisk1 is showing 100% busy and system seems hanged

Archive User

Archive UserWed October 28, 2009 05:39 AM

  • 1.  hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Mon October 19, 2009 03:56 AM

    Originally posted by: sajid1


    hdisk0 and hdisk1 is showing 100% busy most of the time and system goes to hanged whereas CPU utilization is not more than using 30%,even a simple command(ex:- ls -lrt) take more time Really it is awful,Find below screenshot using topas command

    Kernel 6.0 |## |
    User 23.5 |####### |
    Wait 9.4 |### |
    Idle 61.1 |################## |
    Disk Busy% KBPS TPS KB-Read KB-Writ
    dac1 0.0 12.9K 116.8 8173.0 5081.5K
    hdisk3 32.8 12.9K 116.8 8173.0 5081.5K
    hdisk1 100.3 3846.9 553.7 37.8 3809.1
    hdisk0 100.3 3662.0 534.8 65.6 3596.4
    hdisk2 0.0 0.0 0.0 0.0 0.0

    Sajid
    #AIX-Forum


  • 2.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Mon October 19, 2009 06:54 AM

    Originally posted by: SystemAdmin


    Post the list of processes from the topas screen. Would be useful to check out what processes are the top users of the CPU.

    Have you tried running commands from the console itself? Is there a delay while running the commands in both remote session and console?

    iostat & filemon should help you find out why hdisks are 100% utilized.

    r/
    R
    #AIX-Forum


  • 3.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Tue October 20, 2009 08:59 AM

    Originally posted by: sajid1


    Thanks for reply.I have monitored there are oracle process and oracle batch run while issue occur i think so but the thing is why hdisk0 and hdisk1 is taking 100% whereas hdisk0 and hdisk1 does not have mount point where Oracle home exists.find below for your information.It is very very critical issue users are not able to work anyhting while issue occured.
    $ lspv -p hdisk0
    hdisk0:
    PP RANGE STATE REGION LV NAME TYPE MOUNT POINT
    1-1 used outer edge hd5 boot N/A
    2-110 free outer edge
    111-112 used outer middle hd6 paging N/A
    113-120 used outer middle lg_dumplv sysdump N/A
    121-195 used outer middle hd6 paging N/A
    196-219 free outer middle
    220-220 used center hd8 jfs2log N/A
    221-221 used center hd4 jfs2 /
    222-227 used center hd2 jfs2 /usr
    228-235 used center hd9var jfs2 /var
    236-243 used center hd3 jfs2 /tmp
    244-247 used center hd1 jfs2 /home
    248-248 used center hd10opt jfs2 /opt
    249-249 used center hd2 jfs2 /usr
    250-252 used center hd4 jfs2 /
    253-257 used center hd2 jfs2 /usr
    258-273 used center hd3 jfs2 /tmp
    274-277 used center hd2 jfs2 /usr
    278-328 free center
    329-437 free inner middle
    438-546 free inner edge

    $ lspv -p hdisk1
    hdisk1:
    PP RANGE STATE REGION LV NAME TYPE MOUNT POINT
    1-1 used outer edge hd5 boot N/A
    2-110 free outer edge
    111-187 used outer middle hd6 paging N/A
    188-188 used outer middle loglv01 jfslog N/A
    189-197 free outer middle
    198-204 used outer middle lv02 jfs /mkcd/cd_images
    205-219 free outer middle
    220-220 used center hd8 jfs2log N/A
    221-221 used center hd4 jfs2 /
    222-228 used center hd2 jfs2 /usr
    229-236 used center hd9var jfs2 /var
    237-244 used center hd3 jfs2 /tmp
    245-248 used center hd1 jfs2 /home
    249-249 used center hd10opt jfs2 /opt
    250-252 used center hd4 jfs2 /
    253-257 used center hd2 jfs2 /usr
    258-273 used center hd3 jfs2 /tmp
    274-277 used center hd2 jfs2 /usr
    278-328 free center
    329-437 free inner middle
    438-546 free inner edge

    Thanks

    Sajid
    #AIX-Forum


  • 4.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Tue October 20, 2009 09:07 AM

    Originally posted by: alethad


    Just a thought.
    Do you have your paging configured to these disks? If so, have you checked your paging? Do you have enough configured?

    Oracle doesn't usually page unless you force it to use paging. But then Oracle might not be the problem at all.

    Otherwise as suggested before what does your topas, iostat, and vmstat look like?
    #AIX-Forum


  • 5.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Tue October 20, 2009 09:41 AM

    Originally posted by: sajid1


    Thanks for replying.

    Just a thought.
    Do you have your paging configured to these disks? If so, have you checked your paging?
    ---How can I check that paging configured to these disks?
    Do you have enough configured?
    ---How can i compare it?

    Thanks,
    Sajid
    #AIX-Forum


  • 6.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Tue October 20, 2009 10:32 AM

    Originally posted by: SystemAdmin


    From the output you have posted paging is configured on these disks. To check how much of it you have use:

    lsps -a

    But whether it is sufficient or not depends on a lot of factors the primary being the memory and secondary the application requirements itself.

    One more thing I have noticed is during Oracle backups there is a lot of paging activity on certain configurations, which could affect response time. So this Oracle scripts/batch processes being run....are they for backups?

    r/
    R
    #AIX-Forum


  • 7.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Tue October 20, 2009 11:28 AM

    Originally posted by: sajid1


    Find below output of provided command and let me know.
    lsps -a
    $ lsps -a
    Page Space Physical Volume Volume Group Size %Used Active Auto Type
    hd6 hdisk0 rootvg 19712MB 32 yes yes lv
    One more thing I have noticed is during Oracle backups there is a lot of paging activity on certain configurations, which could affect response time. So this Oracle scripts/batch processes being run....are they for backups?
    ---No,not for backups This is Retail environment so a lots of batches run on daily basis ex: posting,week close,data migrate from one databse to other database etc.All batched are scheduled.
    Kindly provide the solution,

    Many Thanks,
    #AIX-Forum


  • 8.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Tue October 20, 2009 11:50 AM

    Originally posted by: alethad


    Oracle does page alot during backups. So timing those to be sure they are not running during your heaviest usage of users will help. If this is the only time you are experiencing the performance issue then that's your source.

    I have plenty of paging allocated on my system but once in a while I will have a performance issue while the Oracle backup is running. So I'm not sure increasing your paging will help. Oracle is going to use what it wants to use just like RAM. Unless you have it configured way too small. You will need to evaluate that yourself. Every app is different.
    #AIX-Forum


  • 9.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Thu October 22, 2009 12:36 AM

    Originally posted by: Kosala


    32% of 19Gb is bit on the high side of a paging space. From your description it seems you're running out of memory. Try "vmstat -v" and watch out for "lruable pages" and "free pages" if both are low values, then your Mem is running out.

    You can also try.

    1. ps -eko rssize | awk '{rss+=$1}END{print rss}'

    this will tell you whether your resident memory footprint. If this is closer to your total memory, then that explains the high activity on the paging disks.

    BTW, what is the total memory on your system?
    #AIX-Forum


  • 10.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Wed October 21, 2009 03:53 AM
      |   view attached

    Originally posted by: sajid1


    For your information Backup is scheduled early in the morning at 5 a.m and issue occur 11:30 a.m 2 p.m while running some script mostly i think so,there is no concerned with the backup.I am attaching that one culprit script and let me know it need anything change in it?
    As you updated to increase paging can i incraese it without any issue if issue is not resolved will revert back it?

    Thanks,
    #AIX-Forum

    Attachment(s)



  • 11.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Wed October 21, 2009 04:38 AM

    Originally posted by: SystemAdmin


    Completely agree with alethad.

    Paging space can be increased/decreased on the fly. But decreasing will take a lot more time than increasing.

    The script contains a lot of SQL statements. My knowledge of sql is limited. Any SQL/DBA gurus out there who can translate whats in this script?

    The ksh statements looks ok though.....I managed to only quickly glance through it....gave up once the sql barged in.

    r/
    R
    #AIX-Forum


  • 12.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Wed October 21, 2009 11:08 AM

    Originally posted by: alethad


    I agree with rnoel.

    I'm not much on SQL either but you need to get your DBA to look at the SQL section of the script to see if it is efficiently written. I see you are performing some table insertions and other things. Hopefully it isn't doing something like full table scans if you know what I mean. This may not be your problem but poorly written SQL, or any database scripts for that matter, can bring your system to its knees very quickly. No offense intended to you if this is a script you wrote.

    How much RAM do you have on this system?
    If you need to increase you paging check out the man page for chps. But if you want to try it on for size I think in your case you could probably create a new paging space logical volume, like paging00, which would be easier to remove later. Although as suggested before you really need to research it and make sure you understand it before you actually do it.

    Hope that helps.
    P.S.
    One piece of advice for you and you may already know this, but from experience, there are still plenty of programmers that put SQL statements(or any database commands) into scripts with no regard for your system or its resources. You will need to really have these tested well before putting them into your production environment. And it is much harder to get them to change their poorly written scripts once it hits production than it is to hold them over a barrel in the testing environment to fix it first before you release it.
    Because when something goes wrong with their scripts the finger will usually start with you.
    #AIX-Forum


  • 13.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Sat October 24, 2009 03:45 AM

    Originally posted by: sajid1


    HI,
    Find below for your information.
    $ lparstat -i
    Node Name : rmsdb
    Partition Name : RMS DB
    Partition Number : 9
    Type : Shared-SMT
    Mode : Capped
    Entitled Capacity : 5.00
    Partition Group-ID : 32777
    Shared Pool ID : 0
    Online Virtual CPUs : 5
    Maximum Virtual CPUs : 12
    Minimum Virtual CPUs : 3
    Online Memory : 16384 MB
    Maximum Memory : 17408 MB
    Minimum Memory : 12288 MB
    Variable Capacity Weight : 0
    Minimum Capacity : 3.00
    Maximum Capacity : 6.00
    Capacity Increment : 0.01
    Maximum Physical CPUs in system : 16
    Active Physical CPUs in system : 16
    Active CPUs in Pool : 16
    Unallocated Capacity : 0.00
    Physical CPU Percentage : 100.00%
    Unallocated Weight : 0
    $

    Sajid
    #AIX-Forum


  • 14.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Sat October 24, 2009 04:21 AM

    Originally posted by: sajid1


    The output of provided query

    $ ps -eko rssize | awk '{rss+=$1}END{print rss}'
    8894852
    $

    Sajid
    #AIX-Forum


  • 15.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Sat October 24, 2009 04:59 AM

    Originally posted by: sajid1


    Find output of vmstat -v and let me know the solution.

    $ vmstat -v
    4194304 memory pages
    3978334 lruable pages
    20497 free pages
    8 memory pools
    1008310 pinned pages
    80.0 maxpin percentage
    20.0 minperm percentage
    80.0 maxperm percentage
    53.3 numperm percentage
    2123562 file pages
    0.0 compressed percentage
    0 compressed pages
    53.3 numclient percentage
    80.0 maxclient percentage
    2123562 client pages
    0 remote pageouts scheduled
    23038 pending disk I/Os blocked with no pbuf
    108558584 paging space I/Os blocked with no psbuf
    2484 filesystem I/Os blocked with no fsbuf
    60097 client filesystem I/Os blocked with no fsbuf
    128297 external pager filesystem I/Os blocked with no fsbuf
    0 Virtualized Partition Memory Page Faults
    0.00 Time resolving virtualized partition memory page faults
    $ vmstat -v
    4194304 memory pages
    3978334 lruable pages
    19851 free pages
    8 memory pools
    1007890 pinned pages
    80.0 maxpin percentage
    20.0 minperm percentage
    80.0 maxperm percentage
    55.4 numperm percentage
    2205243 file pages
    0.0 compressed percentage
    0 compressed pages
    55.4 numclient percentage
    80.0 maxclient percentage
    2205243 client pages
    0 remote pageouts scheduled
    23038 pending disk I/Os blocked with no pbuf
    108595915 paging space I/Os blocked with no psbuf
    2484 filesystem I/Os blocked with no fsbuf
    60097 client filesystem I/Os blocked with no fsbuf
    128297 external pager filesystem I/Os blocked with no fsbuf
    0 Virtualized Partition Memory Page Faults
    0.00 Time resolving virtualized partition memory page faults
    #AIX-Forum


  • 16.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Sat October 24, 2009 07:11 AM

    Originally posted by: Kosala


    The results seem to be fine with me. Your resident memory foot print for the user processes and kernel processes seems to be 50% of the total memory and the other 50% is filled by the file caches. This server seems to be doing lot of file reads, but there is no memory problem.

    Is this snapshot taken during a problem or on a idle system?
    #AIX-Forum


  • 17.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Sat October 24, 2009 11:29 AM

    Originally posted by: sajid1


    Is this snapshot taken during a problem or on a idle system?
    --While hdisk0 60% and hdisk1 10%,issue occur only when hidsk0 and hdisk1 goes to 100%

    If it is not the problem of memory then where is the problem How can i resolve it??
    On daily basis during running script the system goes to hang it is not single time it is many time.
    Can you explain me as we know where is the Oracle home and datafile it is not on hdisk0 and hdisk1 it is on hdisk3 then why hdisk0 and hdisk1 is taking 100% during oracle script running as i attached with this tkt?

    Sajid
    #AIX-Forum


  • 18.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Thu October 29, 2009 03:24 AM

    Originally posted by: Kosala


    Sajid, I think for you to get a clear idea what is going on you will need to do some analysis during an incident. Analyzing a cold system is not very productive. I would recommend you to open a PMR. But, you might want to see the paging activity of the server during a incident using "vmstat", and try to get an idea of the RSS and VSZ.
    #AIX-Forum


  • 19.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Sat October 31, 2009 05:52 AM

    Originally posted by: sajid1


    what is PMR?
    #AIX-Forum


  • 20.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Mon November 02, 2009 05:18 AM

    Originally posted by: sajid1


    Anyone is there who can take challenge to sort out this issue
    #AIX-Forum


  • 21.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Tue November 03, 2009 11:10 AM

    Originally posted by: flodstrom


    Which OS version does the machine run?

    Have you contacted IBM support for this? It is after all possible that something is wrong with your system.

    Does a reboot solve the problem (at least temporary)?

    Other than that I must say that I'm 99% sure that your problem is due to heavy paging/swapping which means memory problems! The reason is that I see your system has a scary high amount of paging space I/O's pending. It's possible that the system is starved of virtual memory? I would try some drastic things like increasing the page space to 32GB and see what happens. If that doesn't help it's most likely time to invest in more system memory.

    As for the reason why only hdisk0 and hdisk1 shows load is because hdisk0 is your system disk (and the default location of swap/page space), hdisk1 is most likely the mirror of hdisk1 which is why you see both of them busy at the same time.

    Is hdisk3 a RAID array of sime kind? In case it's a RAID array you could also consider moving most of your page space to this device just to speed up paging and it will also put less stress on the system disk.
    #AIX-Forum


  • 22.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Wed November 04, 2009 05:17 AM

    Originally posted by: sajid1


    Many Thanks Flodstrom,

    Which OS version does the machine run?
    --Means what,Os version is 5.3

    Have you contacted IBM support for this? It is after all possible that something is wrong with your system.
    --No not yet,Support contract have finished

    Does a reboot solve the problem (at least temporary)?
    --ok i will do it soon

    Other than that I must say that I'm 99% sure that your problem is due to heavy paging/swapping which means memory problems! The reason is that I see your system has a scary high amount of paging space I/O's pending. It's possible that the system is starved of virtual memory? I would try some drastic things like increasing the page space to 32GB and see what happens. If that doesn't help it's most likely time to invest in more system memory.

    --can u guide me to increase the paging size on PROD with all details,can i increase it without anything down?

    As for the reason why only hdisk0 and hdisk1 shows load is because hdisk0 is your system disk (and the default location of swap/page space), hdisk1 is most likely the mirror of hdisk1 which is why you see both of them busy at the same time.

    Is hdisk3 a RAID array of sime kind? In case it's a RAID array you could also consider moving most of your page space to this device just to speed up paging and it will also put less stress on the system disk.

    --How can i check it?

    Thanks to take interest

    Regards,

    Sajid
    #AIX-Forum


  • 23.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Thu November 05, 2009 12:01 PM

    Originally posted by: flodstrom


    You should really consider getting that support contract back online again, it sounds like you have a rather critical system that is in dire need of maintenance.

    Also, as dukessd mentioned, all that really can be done here is giving you some advice on what to do, what to look for, etc. It's really a shot in the dark for critical production systems.

    Which OS level of AIX does it have (oslevel)? In case it's an old AIX 5.3 installation you might have to consider updating the OS as well.

    You can change page space on the fly (no reboot needed), use "chps" or do it in smitty. Note that you add page space in LP size increments, not direct GB sizes. Decreasing page space is another thing though, I would recomend doing that on an idling system only.

    As for the storage, what is connected to the host system? It's after all your system and you should know this?

    You might also be able to get some brief info of what it might be from "lscfg | grep hdisk".
    #AIX-Forum


  • 24.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Wed October 28, 2009 05:39 AM

    Originally posted by: sajid1


    kindly update
    #AIX-Forum


  • 25.  Re: hdisk0 and hdisk1 is showing 100% busy and system seems hanged

    Posted Tue November 03, 2009 06:37 PM

    Originally posted by: dukessd


    Go here:
    http://www.ibm.com/us/en/
    Click on "Get Support"
    Click on "Open a service request"
    Fill in the forms and get some proper help, or call your country IBM AIX software support number.

    Your system is in trouble and posting here is a shot in the dark for this type of problem, call IBM and open a service request, also known as a software support call, or a "PMR".

    They will probably ask you to download and run the PerfPMR script and then send them the output for analysis.

    This is not something we can do here in any reasonable amount of time.

    Hope this helps.
    #AIX-Forum