AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.

 View Only
Expand all | Collapse all

New Power6 - AIX 6.1L - Sys using twice the CPU as User?

  • 1.  New Power6 - AIX 6.1L - Sys using twice the CPU as User?

    Posted Mon January 25, 2010 11:32 AM

    Originally posted by: troym72


    Hello,

    We just purchased two new 4-way (2 prod LPAR, 2 test LPAR) 5Ghz Power6 Servers (failover) with 64GB RAM (32GM per node) runing AIX 6.1 with two LPARs per node connected to our SAN with two 4GB HBAs.

    When we started parallel testing to move our production applicaiton to this server, I noticed that it didn't seem to be performing as fast as I thought it should compared to our existing server

    Our exisiting server is an 8-way (5 prod LPAR, 2 test LPAR, 1 hot backup), 1.6Ghz Power5 with 32GB RAM (16GB per node) connected to our SAN with two 2GB HBAs.

    I started running the common performance monitoring tools during our testing, like VMSTAT, MPSTAT, etc. For some reason the SY is using about twice the CPU as the US. Everything I've ever been told about UNIX Administration says that the System should not use more CPU than the User Processes. If it does, that means that there is a config issue with the OS.

    So, the vendor (Not IBM) that installed the servers for us has not been able to explain or correct this after numerous config tweaks with the filesystem, kernel settings, I/O buffers, etc.

    The VMSTAT does not show any obvious bottlenecks other than the OS seems to be using way too much CPU compared to the User Processes.

    Here is a sample of the VMSTAT output during a test which represented about 20% of our production transaction volume going through the new server.

    />vmstat -w 5

    System configuration: lcpu=8 mem=24576MB

    kthr memory page faults cpu
    r b avm fre re pi po fr sr cy in sy cs us sy id wa
    1 0 3340754 1940960 0 0 0 0 0 0 65 21903 1207 2 3 95 0
    2 0 3340770 1940918 0 0 0 0 0 0 260 43677 1654 3 6 90 1
    2 0 3340885 1940771 0 0 0 0 0 0 125 37038 1601 3 8 89 0
    1 0 3340742 1940897 0 0 0 0 0 0 75 24788 1290 2 5 93 0
    1 0 3340699 1940913 0 0 0 0 0 0 99 38021 1375 2 6 92 0
    1 0 3340685 1940898 0 0 0 0 0 0 97 34672 1424 2 5 93 0
    1 0 3340673 1940881 0 0 0 0 0 0 137 23928 1640 3 8 89 0
    1 0 3340634 1940881 0 0 0 0 0 0 135 39418 1615 3 6 91 0
    1 0 3341393 1940054 0 0 0 0 0 0 166 26856 1749 4 7 88 0
    1 0 3341378 1940035 0 0 0 0 0 0 106 35104 1301 2 5 93 0
    1 0 3341381 1940008 0 0 0 0 0 0 73 36011 1171 2 3 95 0
    1 0 3341407 1939948 0 0 0 0 0 0 101 23827 1330 2 5 93 0
    1 0 3341377 1939933 0 0 0 0 0 0 143 33983 1638 3 7 90 0
    0 0 3341394 1939876 0 0 0 0 0 0 249 38386 1634 3 6 90 0

    As we put more load on the machine, I thought that this might even out, but it didn't. Below is a VMSTAT from a test that represented about 200% of our production volume being processed by the new server.

    System configuration: lcpu=8 mem=24576MB

    kthr memory page faults cpu
    r b avm fre re pi po fr sr cy in sy cs us sy id wa
    1 1 2323028 3038088 0 0 0 0 0 0 731 53814 7149 18 45 37 1
    3 0 2324120 3036759 0 0 0 0 0 0 825 54887 7107 20 43 36 1
    3 0 2324346 3036422 0 0 0 0 0 0 758 45717 5610 16 41 42 2
    2 1 2324357 3036295 0 0 0 0 0 0 932 52869 7709 17 46 36 1
    2 0 2324395 3036165 0 0 0 0 0 0 774 46603 5759 16 42 42 1
    2 0 2323100 3037244 0 0 0 0 0 0 893 52706 7509 17 45 37 2
    4 0 2324297 3035931 0 0 0 0 0 0 737 45806 5381 15 38 46 1
    3 0 2324751 3035377 0 0 0 0 0 0 773 53345 7091 18 46 35 1
    3 0 2324801 3035185 0 0 0 0 0 0 773 52399 7071 17 43 39 1
    2 0 2325211 3034652 0 0 0 0 0 0 615 46806 5469 17 41 42 1
    2 1 2325890 3033848 0 0 0 0 0 0 757 50556 6565 21 43 35 1
    2 0 2324992 3034627 0 0 0 0 0 0 712 51243 7530 13 41 45 1
    3 0 2325939 3033444 0 0 0 0 0 0 655 46586 5832 17 39 42 1
    3 1 2325297 3033969 0 0 8 0 0 0 659 52255 6002 19 42 38 1
    3 0 2325296 3033879 0 0 0 0 0 0 705 51447 6256 18 45 36 1
    4 0 2326345 3032446 0 0 0 0 0 0 566 58858 9930 13 43 44 1
    4 0 2326502 3032220 0 0 0 0 0 0 371 39132 3743 10 37 53 0
    3 1 2329518 3029111 0 0 0 0 0 0 595 55473 6341 22 45 33 1

    Is this normal? Am I just wrong about what normal CPU utilization should be in an AIX LPAR environment?

    Thanks so much!
    Troy


  • 2.  Re: New Power6 - AIX 6.1L - Sys using twice the CPU as User?

    Posted Tue January 26, 2010 02:23 AM

    Originally posted by: Montecarlo


    If your application was compiled and optimized for your previous hardware, you may be suffering from emulated instructions on the new hardware. The emstat command will show the reate of emulated instructions.
    Regards, Simon


  • 3.  Re: New Power6 - AIX 6.1L - Sys using twice the CPU as User?

    Posted Tue January 26, 2010 09:51 AM

    Originally posted by: troym72


    />uptime
    8:47am up 12 days, 13:22, 2 users, load average: 0.85, 0.80, 0.82
    tmdvim1b@hci (/home/hci)
    />emstat -v

    Emulation Emulation Emulation Emulation Emulation Emulation Emulation Emulation Emulation Emulation
    SinceBoot Delta Delta00 Delta01 Delta02 Delta03 Delta04 Delta05 Delta06 Delta07
    0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0


  • 4.  Re: New Power6 - AIX 6.1L - Sys using twice the CPU as User?

    Posted Wed January 27, 2010 02:54 AM

    Originally posted by: Montecarlo


    Ok, that's good news.
    Try nmon to see which processes are responsible for %sys use.
    nmon or topas_nmon
    t - for top processes
    2 - for cpu information
    + - a couple of times to increase the display delay 2,4,8 seconds
    Display should look something like:

    topas_nmon N=NFS Host=xxxxxxxx Refresh=8 secs 09:48.03
    Top-Processes-(104) Mode=2 1=Basic 2=CPU 3=Perf 4=Size 5=I/O 6=Cmds
    PID Time <------CPU-Total-------> Child <----Delta-----> Command
    Start Totals User+System Total TotalUsr + Sys
    495788 01:50:06 8755:00 8755:00+ 0:00 0:00 99.7 http://99.7+ 0.0 restbyname
    466994 01:50:06 8755:04 8755:04+ 0:00 0:00 99.3 http://99.3+ 0.0 restbyname
    127106 12:36:04 53:60 0:05+53:55 0:00 0.4 http:// 0.0+ 0.4 syncd
    323792 12:36:19 7:69 1:54+ 6:15 0:00 0.0 http:// 0.0+ 0.0 getty

    syncd has used 53 mins and 55 seconds of system cpu vs 5 seconds user time. This is as expected because it executes most instructions in system (kernel) space.
    If the processes consuming system time are active, they should show up in this display.

    Regards, Simon


  • 5.  Re: New Power6 - AIX 6.1L - Sys using twice the CPU as User?

    Posted Wed January 27, 2010 10:34 AM

    Originally posted by: MarkTaylor


    What is the Application you are running ? and how did you deploy it to the new p6 servers ? There are some gains to be made by recompiling applications to make use of the pwr6 hardware, but sometimes this is not possible if the vendor of your app has not done this yet ..

    Saying that, I would not expect not recompiling to causes excessive system time, so as detailed above, collect some metrics using nmon to start, then you can use trace and curt to generate a cpu reports if you want to delve deeper .. you might want to look at sar data also as this gives fork/exec statistics .. lots of fork/execs can be caused by runaway scripts in very tight loops, pprof will help you with this (pprof.famind report)..

    If you cannot nail it, then raise a performance PMR with IBM support and collect perfpmr data ..

    HTH
    Mark Taylor


  • 6.  Re: New Power6 - AIX 6.1L - Sys using twice the CPU as User?

    Posted Wed January 27, 2010 11:40 AM

    Originally posted by: troym72


    The application is an interfacing application (Healthvision Cloverleaf) that receives Helathcare HL7 transactions via TCP/IP from various applications and routes them to the appropriate destination application(s). From the time the transaction is received until it is sent out of the interface engine, it could be translated (via Tcl programs) several times. Translation consists of transaction re-formatting, field reformatting, table maps, transaction filtering logic and other types of data massaging.

    While being routed and translated, the transactions are stored temporarily in a Raima database (Healthvision's 3rd party Db agreement) for disaster recovery purposes. If someting dies or is stopped, the undelivered messages are read from the database and the engine continues where it left off. There are 15 points at which the transactions are saved to the database during their journey from the source to the destination.

    So, the application is pretty I/O intensive. Each transaction is betweeen 1k and 2k and its written to the Raima Db at least 15 times. Our production environment processes about 1.2 million of these transactions per day on average. We are expecting our volume of transactions to roughly double in the next four years (hence the new server).

    We changed our min and max to the values suggested by the vendor, Healthvision.


  • 7.  Re: New Power6 - AIX 6.1L - Sys using twice the CPU as User?

    Posted Thu January 28, 2010 09:24 AM

    Originally posted by: MarkTaylor


    Ok, thanks for the overview, I am working with a company next week that is going to hammer an Intersystems Cache Database with their Application with HL7 messages .. Interestingly, the level of Cache database they were using has a bug that causes high CPU load which is fixed in the current release which we are testing next week.. just a side point that ;)

    So, really, the same applies, find out what is chewing your CPU as detailed above .. nmon to get a high level view, then trace, curt, tprof, pprof to dig deeper if its not really obvious ..

    Rgds
    Mark Taylor


  • 8.  Re: New Power6 - AIX 6.1L - Sys using twice the CPU as User?

    Posted Wed February 03, 2010 05:49 PM

    Originally posted by: troym72


    Here is the TOPAS/NMON output while I was monitoring one of the hciengine interface processes that was working at full speed.

    We generally have about 300-400 of these processes running at any given time in Production, however, they are rarely processing as much data as this one is in my test.

    As you can see, it does a lot of I/O. But why is the paging so high?

    Is there something else I can look at that might tell me what this process is doing to make my OS work so hard?

    Thanks!
    ┌─topas_nmon──1=Top-Basics───────Host=tmdvim1b───────Refresh=2 secs───16:43.09─────────────────────────────────────────┐
    │ Top-Processes-(664) ─────Mode=6 1=Basic 2=CPU 3=Perf 4=Size 5=I/O 6=Cmds──────────────────────────────────────────│
    │Procs %usr %sys Res Res Char Paging Command │
    │Total=100 Text-KB Data-KB I/O-KB │
    │ 8 -7.4939 291.7033 0 3072 0 0 wait │
    │ 237 5.4968 37.6743 705312 5627704 120 6544 hciengine │


  • 9.  Re: New Power6 - AIX 6.1L - Sys using twice the CPU as User?

    Posted Wed February 03, 2010 05:58 PM

    Originally posted by: troym72


    Here is more TOPAS-NMON information. Seems that Page Faults are pretty high.

    Any thoughts?

    ┌─topas_nmon──.=OnlyBusyMode─────Host=tmdvim1b───────Refresh=2 secs───16:54.09─────────────────────────────────────────┐
    │ Memory ──────────────────────────────────────────────────────────────────────────────────────────────────────────────│
    │ Physical PageSpace | pages/sec In Out | FileSystemCache │
    │% Used 88.5% 0.8% | to Paging Space 0.0 0.0 | (numperm) 37.5% │
    │% Free 11.5% 99.2% | to File System 0.0 24.5 | Process 32.2% │
    │MB Used 21746.7MB 95.5MB | Page Scans 0.0 | System 18.8% │
    │MB Free 2829.3MB 12192.5MB | Page Cycles 0.0 | Free 11.5% │
    │Total(MB) 24576.0MB 12288.0MB | Page Steals 0.0 |

    │ | Page Faults 10377.0 | Total 100.0% │
    │------------------------------------------------------------ | numclient 37.5% │
    │Min/Maxperm 713MB( 3%) 21376MB( 87%) <--% of RAM | maxclient 87.0% │
    │Min/Maxfree 960 1088 Total Virtual 36.0GB | User 66.5% │
    │Min/Maxpgahead 2 8 Accessed Virtual 12.5GB 34.6%| Pinned 19.2% │


  • 10.  Re: New Power6 - AIX 6.1L - Sys using twice the CPU as User?

    Posted Tue April 26, 2011 11:40 PM

    Originally posted by: newbie0303


    Pardon me with my ignorance, is 6.1 and 6L different versions?


  • 11.  Re: New Power6 - AIX 6.1L - Sys using twice the CPU as User?

    Posted Thu October 06, 2011 04:18 AM

    Originally posted by: beeplo


    We have the same problem with power7 p770 (12 core 3.5Ghz, 256GB ram). Our old system is power5 and performance gain is insignificant. We try AIX 6.1 and 7.1 but results are identical. We also try all possible tuning and we write to IBM, they send us some patch but without success.
    Did you solve your problem?