AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.


#Power
 View Only
  • 1.  AIX61 TL03 SP03 Kernel Memory Leak?

    Posted Tue March 01, 2011 01:02 PM

    Originally posted by: MatthewBourne


    Folks

    Been struggling to help my customer with this issue for some months now - unfortunately nothing tremendously useful coming out of the PMR route just yet. Wondering if anyone has seen similar symptoms?

    Please help!

    Thanks

    Scenario
    Uptime is less than 60 days, memory all but exhausted, including paging space. Left alone the LPAR will probably crash, logging "out of resource" type messages in error log.

    
    >oslevel -s 6100-03-03-0943
    


    
    > lsattr -El mem0 ent_mem_cap         I/O memory entitlement in Kbytes           False goodsize       4096 Amount of usable physical memory in Mbytes False size           4096 Total amount of physical memory in Mbytes  False var_mem_weight      Variable memory capacity weight            False
    


    
    > lsps -a Page Space      Physical Volume   Volume Group    Size %Used Active Auto  Type Chksum hd6             hdisk0            rootvg        4160MB    56   yes   yes    lv     0
    


    NMON tells me that the kernel is the biggest consumer of memory pages:

    
    FileSystemCache(numperm) 11.1% Process                  16.3% System                   71.9% Free                      0.8%
    


    SVMON says pgsp is over 50% utilised...

    
    > svmon -G -O unit=MB Unit: MB ------------------------------------------------------------------------------- size       inuse        free         pin     virtual  available memory      4096.00     4060.28        35.7      985.27     5497.27     367.96 pg space    4160.00     2314.65 work        pers        clnt       other pin          846.77           0           0      138.50 in use      3441.47           0      618.81
    


    ... but I've got nearly 2GB occupancy in pgsp (7x256MB) due to kernel segments that show a minimal if not zero count of pages in use (~110MB). Over time, we'll observe the number of segments that look like this increase - until eventually the LPAR becomes unresponsive and ultimately crashes.

    
    > svmon -S -t 10 -O unit=MB,filtercat=kernel,sortseg=pgsp Unit: MB   Vsid      Esid Type Description              PSize  Inuse   Pin Pgsp Virtual 46008         - work kernel heap                  m      0     0 256.00  256.00 5600a         - work kernel heap                  m      0     0 256.00  256.00 3e007         - work kernel heap                  m      0     0 256.00  256.00 4e009         - work kernel heap                  m      0     0 256.00  256.00 36006         - work kernel heap                  m      0     0 256.00  256.00 2e005         - work kernel heap                  m   15.0     0 241.00  256.00 5e00b         - work kernel heap                  m 100.88  0.06 155.12  256.00 6a00         - work kernel heap                  m 106.94  8.88 106.25  115.25 4000         - work page table area s   9.45  0.17 23.7    23.9 28005  9ffffffd work shared library              sm   0.28     0 7.66    7.66
    


    VMO settings are as per recommendation, I believe:

    
    >vmo -F -L NAME                      CUR    DEF    BOOT   MIN    MAX    UNIT           TYPE DEPENDENCIES ams_loan_policy           n/a    1      1      0      2      numeric           D force_relalias_lite       0      0      0      0      1      
    
    boolean           D kernel_heap_psize         64K    0      0      0      16M    bytes             B   lgpg_regions              0      0      0      0      8E-1                     D lgpg_size   lgpg_size                 0      0      0      0      16M    bytes             D lgpg_regions   low_ps_handling           1      1      1      1      2                        D   maxfree                   1088   1088   1088   16     838860 4KB pages         D minfree memory_frames   maxperm                   899409        899409                                 S   maxpin                    845956        845956                                 S   maxpin%                   80     80     80     1      100    % memory          D pinnable_frames memory_frames   memory_frames             1M            1M                   4KB pages         S   memplace_data             2      2      2      1      2                        D memory_affinity   memplace_mapped_file      2      2      2      1      2                        D memory_affinity   memplace_shm_anonymous    2      2      2      1      2                        D memory_affinity   memplace_shm_named        2      2      2      1      2                        D memory_affinity   memplace_stack            2      2      2      1      2                        D memory_affinity   memplace_text             2 2      2      1      2                        D memory_affinity   memplace_unmapped_file    2      2      2      1      2                        D memory_affinity   minfree                   960    960    960    8      838860 4KB pages         D maxfree memory_frames   minperm                   29980         29980                                  S   minperm%                  3      3      3      1      100    % memory          D maxperm% maxclient%   nokilluid                 0      0      0      0      4G-1   uid               D   npskill                   8320   8320   8320   1      1M-1   4KB pages         D   npswarn                   33280  33280  33280  1      1M-1   4KB pages         D   numpsblks                 1040K         1040K                4KB blocks        S   pinnable_frames           796362        796362               4KB pages         S   relalias_percentage       0      0      0      0      32K-1                    D   scrub                     0      0      0      0      1      
    
    boolean           D   v_pinshm                  0      0      0      0      1      
    
    boolean           D   vmm_default_pspa          0      0      0      -1     100    numeric           D   wlm_memlimit_nonpg        1      1      1      0      1      
    
    boolean           D   ##Restricted tunables -------------------------------------------------------------------------------- cpu_scale_memp            8      8      8      4      64                       B   data_stagger_interval     161    161    161    0      4K-1   4KB pages         D lgpg_regions   defps                     1      1      1      0      1      
    
    boolean           D   framesets                 2      2      2      1      10                       B   htabscale                 n/a    -1     -1     -4 0                        B   kernel_psize              64K    0      0      0      16M    bytes             B   large_page_heap_size      0      0      0      0      8E-1   bytes             B lgpg_regions   lru_file_repage           0      0      0      0      1      
    
    boolean           D   lru_poll_interval         10     10     10     0      60000  milliseconds      D   lrubucket                 128K   128K   128K   64K    1M     4KB pages         D   maxclient%                90     90     90     1      100    % memory          D maxperm% minperm%   maxperm%                  90     90     90     1      100    % memory          D minperm% maxclient%   mbuf_heap_psize           64K    0      0      0      16M    bytes             B   memory_affinity           1      1      1      0      1      
    
    boolean           B   npsrpgmax                 65K    65K    65K    1      1M-1   4KB pages         D npsrpgmin   npsrpgmin                 49920  49920  49920  1      1M-1   4KB pages         D npsrpgmax   npsscrubmax               65K    65K    65K    1      1M-1   4KB pages         D npsscrubmin   npsscrubmin               49920  49920  49920  1      1M-1   4KB pages         D npsscrubmax   num_spec_dataseg          0      0      0      0      8E-1                     B   page_steal_method         1      1      1      0      1      
    
    boolean           B   psm_timeout_interval      20000  20000  20000  0      60000  milliseconds      D   rpgclean                  0      0      0      0      1      
    
    boolean           D   rpgcontrol                2      2      2      0      3                        D   scrubclean                0      0      0      0      1      
    
    boolean           D   soft_min_lgpgs_vmpool     0 0      0      0      90     %                 D lgpg_regions   spec_dataseg_int          512    512    512    0      8E-1                     B   strict_maxclient          1      1      1      0      1      
    
    boolean           D strict_maxperm   strict_maxperm            0      0      0      0      1      
    
    boolean           D strict_maxclient   vm_modlist_threshold      -1     -1     -1     -2     2G-1                     D   vmm_fork_policy           1      1      1      0      1      
    
    boolean           D   vmm_mpsize_support        2      2      2      0      2      numeric           B   vmm_vmap_policy           0      0      0      0      1      
    
    boolean           D
    

    and I have the most up to date fixes I can find for this TL/SP combination:

    
    >emgr -l   ID  STATE LABEL      INSTALL TIME      UPDATED BY ABSTRACT === ===== ========== ================= ========== ====================================== 1    S    IZ60190    12/02/10 14:40:41            Ifix 
    
    for an apar IZ60190 2    S    z8703761F3 12/02/10 14:44:35            Ifix 
    
    for apar IZ87037 at 61F SP03.
    

    #AIX-Forum


  • 2.  Re: AIX61 TL03 SP03 Kernel Memory Leak?

    Posted Wed March 02, 2011 01:11 PM

    Originally posted by: MatthewBourne


    Any thoughts anybody?
    #AIX-Forum


  • 3.  Re: AIX61 TL03 SP03 Kernel Memory Leak?

    Posted Tue March 08, 2011 12:30 PM

    Originally posted by: MatthewBourne


    Howdy folks - just wondering if anyone has seen anything similar?
    #AIX-Forum


  • 4.  Re: AIX61 TL03 SP03 Kernel Memory Leak?

    Posted Tue March 08, 2011 02:55 PM

    Originally posted by: dmj12031


    Are you running the audit subsystem?

    IZ87818: MEMORY INCREASING FOR EVENT MANAGEMENT IN AUDITING
    #AIX-Forum


  • 5.  Re: AIX61 TL03 SP03 Kernel Memory Leak?

    Posted Wed March 09, 2011 04:53 AM

    Originally posted by: MatthewBourne


    Funnily enough, we've had a hint that auditing might be something to look at - I'm expecting to disable it on a few LPARs today and see what happens - I'll keep the thread up to date. Thanks for your reply.
    #AIX-Forum


  • 6.  Re: AIX61 TL03 SP03 Kernel Memory Leak?

    Posted Thu March 17, 2011 07:32 AM

    Originally posted by: MatthewBourne


    Update:

    commented out "audit" from inittab, and rebooted. I found the following snippet handy for tracking VM/PGSP usage so I thought I'd share it:

    
    svmon -G -O unit=MB,timestamp=on -i 24 600 | awk 
    'BEGIN{ "hostname |cut -c1-8" |getline y }{ 
    
    if ( $0 ~ /^U/ ) 
    { x = $4 
    } 
    
    else 
    
    if (( $0 ~ /size/ ) || ( $0 ~ /^m/ ) || ( $0 ~ /^pg/ )) 
    { sub(/g s/,
    "g_s",$0); print y 
    " " x 
    " " $0 
    } 
    }
    '
    


    If it turns up anything interesting I'll let you know ...
    #AIX-Forum


  • 7.  Re: AIX61 TL03 SP03 Kernel Memory Leak?

    Posted Fri March 25, 2011 11:55 AM

    Originally posted by: MatthewBourne


    Update:
    audit disabled on a number of LPARs but kernel memory footprint still seems to be growing. Nothing noticeable in paging space yet.

    We've also been asked about the "filepath" kernel extension and why we're running it. Looks like it's part of a TSM client installation to support Journal-based backups. Anyone else running TSM client 6.1.0.0 with similar issues?

    M.
    #AIX-Forum


  • 8.  Re: AIX61 TL03 SP03 Kernel Memory Leak?

    Posted Mon March 28, 2011 11:16 AM

    Originally posted by: flodstrom


    Our systems are slightly older than yours (AIX 6.1 TL3 SP1) and we have a slightly newer TSM client (v6.2.2). I know that the TSM client can take up a surprising amount of memory, around 1.2G on each host. We don't really notice it since they have 32G RAM and more, but I can imagine it being a burdon for a smaller host with only 4G RAM.

    I don't think we are using that filepath kernel extension. However, are there any quick ways of checking that?

    Overall we don't have any problems with growing kernel sizes on our hosts!
    #AIX-Forum


  • 9.  Re: AIX61 TL03 SP03 Kernel Memory Leak?

    Posted Thu March 31, 2011 03:22 AM

    Originally posted by: niella


    Hi there,

    I would strongly recommend that you (always) apply the latest SP when faced with such a problem, in this case AIX 6100-03-09. There are many references to leaks, including a pinned memory leak (IZ94045), and a HIPER relating to large pages.

    There is not much risk in applying the latest SP, these contains fixes and not new functionality.

    Hope this helps.

    Niel
    #AIX-Forum


  • 10.  Re: AIX61 TL03 SP03 Kernel Memory Leak?

    Posted Wed April 27, 2011 06:18 AM

    Originally posted by: MatthewBourne


    Thanks Niel

    Unfortunately, installation of service packs & maintenance levels is controlled very tightly here. If it were a small number of servers, I'd be suggesting the same as you - but I don't see how the customer could maintain a consistent estate and follow such a reactive approach.

    For interest, there are 2 updates:

    (1) the hosts on which the audit subsystem has been disabled are stable. No undue memory pressure or exceptionally high kernel memory usage

    (2) hosts on which the full kernel debug option has been enabled are also stable - despite still running the audit subsystem

    M.
    #AIX-Forum


  • 11.  Re: AIX61 TL03 SP03 Kernel Memory Leak?

    Posted Tue August 02, 2011 10:51 AM

    Originally posted by: MatthewBourne


    Hi

    For anyone interested, the cause of the issue has been located and resolved:

    https://www-304.ibm.com/support/docview.wss?uid=isg1IZ91406

    Thanks to you all for your feedback.

    M.
    #AIX-Forum