Automated Testing

Automated Testing

Automated Testing

Build an automated testing process to enable continuous integration of your hybrid cloud applications including z/OS

 View Only
Expand all | Collapse all

AWSECH004S Unable to define RAS/FEDC memory, RC=-3.

  • 1.  AWSECH004S Unable to define RAS/FEDC memory, RC=-3.

    Posted Fri January 09, 2015 12:52 PM

    Hi,

    Currently, we have 3 RDT instances running on one Redhat linux server with 600 volumes per RDT instance. When the number of volumes are increased to 1000 per  instance. We encountered these error messages.

    Reason - We were able to set up 1 set of subsystems and would like to set up 2 sets.

    Load parm: RW,  Devmap: /nas03/RDTGolden52/rdtgolden52prof1,  Port: 3272
    stopping previous instance
    AWSSTP002E 1090 is not running
    x3270: no process killed
    awsstart /nas03/RDTGolden52/rdtgolden52prof1 --clean
     
     
    IBM System z Personal Development Tool (zPDT)
      Licensed Materials - Property of IBM
      5799-ADE
      (C) Copyright IBM Corp. 2007,2013   All Rights Reserved.
     
    z1091, version 1-5.47.16, build date - 08/30/14 for Linux on RedHat 64bit
     
     
    AWSSTA014I Map file name specified: /nas03/RDTGolden52/rdtgolden52prof1
    AWSSTA090I All zPDT log files purged as requested
    AWSSTA204I zPDT started in directory '/nas03/RDTGolden52'.
    AWSSTA146I Starting independent 1090 instance 'ibmsys3'
    AWSEMI314I CPU 1 zPDTA License Obtained
    AWSEMI314I CPU 2 zPDTA License Obtained
    AWSEMI005I Waiting for 1090 license
    AWSEMI314I CPU 0 zPDTA License Obtained
    OSA code level = 0x4301
    AWSDSA010I AWSOSA is ready for chpid: 0xA2 device: 0x400
    AWSDSA010I AWSOSA is ready for chpid: 0xA2 device: 0x401
    AWSDSA010I AWSOSA is ready for chpid: 0xA2 device: 0x402
    AWSDSA010I AWSOSA is ready for chpid: 0xF4 device: 0x404
    AWSDSA010I AWSOSA is ready for chpid: 0xF4 device: 0x405
    AWSDSA010I AWSOSA is ready for chpid: 0xF4 device: 0x406
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    Unable to define TNPORTL RAS/FEDC memory, RC=-3
    AWSSTA059I System initialization complete
    AWSSTA012I All configured subsystems started
    AWSSTA082E Device manager 56167 (AWSCKD), device 4A31 has terminated
    AWSSTA084I ... performing restart of process AWSCKD
    AWSSTA082E Device manager 56168 (AWSCKD), device 4A32 has terminated
    AWSSTA084I ... performing restart of process AWSCKD
    AWSSTA082E Device manager 56169 (AWSCKD), device 4A33 has terminated
    AWSSTA084I ... performing restart of process AWSCKD
    AWSSTA082E Device manager 56170 (AWSCKD), device 4A34 has terminated
    AWSSTA084I ... performing restart of process AWSCKD
    AWSSTA082E Device manager 56174 (AWSCKD), device 4A38 has terminated
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSSTA084I ... performing restart of process AWSCKD
    AWSSTA082E Device manager 56175 (AWSCKD), device 4A39 has terminated
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSSTA084I ... performing restart of process AWSCKD
    AWSSTA082E Device manager 56176 (AWSCKD), device 4A3A has terminated
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSSTA084I ... performing restart of process AWSCKD
    No completed startup message was found.
    ***************************************
    *** runzpdt terminated with errors. ***
    ***************************************
    AWSSTA082E Device manager 56294 (AWSCKD), device 4A38 has terminated
    AWSSTA084I ... performing restart of process AWSCKD
    AWSSTA082E Device manager 56296 (AWSCKD), device 4A39 has terminated
    AWSSTA084I ... performing restart of process AWSCKD
    AWSSTA082E Device manager 56299 (AWSCKD), device 4A3A has terminated
    AWSSTA084I ... performing restart of process AWSCKD
    AWSSTA082E Device manager 56301 (AWSCKD), device 4A3B has terminated
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSSTA084I ... performing restart of process AWSCKD
    AWSSTA082E Device manager 56303 (AWSCKD), device 4A3C has terminated
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSSTA084I ... performing restart of process AWSCKD
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSSTA082E Device manager 56305 (AWSCKD), device 4A3D has terminated
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSSTA084I ... performing restart of process AWSCKD
    AWSSTA085E ... restart of process AWSCKD failed, RC=-5
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSSTA082E Device manager 56352 (AWSCKD), device 4A32 has terminated
    AWSECH004S Unable to define RAS/FEDC memory, RC=-3.
    AWSSTA085E ... restart of process AWSCKD failed, RC=-5
    AWSSTA082E Device manager 56354 (AWSCKD), device 4A33 has terminated
    AWSSTA085E ... restart of process AWSCKD failed, RC=-5
    AWSSTA082E Device manager 56356 (AWSCKD), device 4A34 has terminated
    AWSSTA085E ... restart of process AWSCKD failed, RC=-5
    AWSSTA082E Device manager 56358 (AWSCKD), device 4A35 has terminated
    AWSSTA085E ... restart of process AWSCKD failed, RC=-5
    AWSSTA082E Device manager 56360 (AWSCKD), device 4A36 has terminated
    AWSSTA085E ... restart of process AWSCKD failed, RC=-5
    AWSSTA082E Device manager 56363 (AWSCKD), device 4A37 has terminated
     
    Please find the uname -a,unlimit -a(from the user running the instance ) and sysctl -a ( using root id ) as attached.


    Please check and suggest.

    Regards,

    Sathya
     

    Acc_Rdz


  • 2.  Re: AWSECH004S Unable to define RAS/FEDC memory, RC=-3.

    Posted Fri January 09, 2015 03:16 PM

    Hello,

    I have looked at the uname and sysctl data. Right now they look reasonable.

    Can you provide :

    • For each of your instances , please provide
      • the system stanza from each  devmap     ( the amount of z memory defined in the devmap could interactive with your kernal settings
      • the number of IO devices in each  devmap
    • the amount of memory on your system and the amount of your machine's swap space  ( the linux "free" command can provide this information

    Regards,

    Keith

     

    keithvb


  • 3.  Re: AWSECH004S Unable to define RAS/FEDC memory, RC=-3.

    Posted Sat January 10, 2015 12:37 PM

    Hi Keith,

    Please find the details below

    1. System stanza Memory and processors are same for all the instances.Please find one instance details below.

    [system]
    memory 12G                # define storage size for virtual host
    processors 3                # number of processors
    3270port 3271               # port number for non-SNA (coax) 3270

    2.  Please find the  IO Device Details below

     Instance1           700 IO device 

     Instance2           600 IO device

     Instance3           600 IO device

    We need to add more volumes to the instances.  At this point we may go upto 900 IO device on each instances.

    3.Free command output:

                          total             used               free              shared           buffers       cached
    Mem:      65936984     64256964      1680020      35552148        176448      54431992
    -/+ buffers/cache:         9648524      56288460
    Swap:     33038332       7971404      25066928

     

    Please let us know if you need any other information.

     

    Regards,

    Sathya

    Acc_Rdz


  • 4.  Re: AWSECH004S Unable to define RAS/FEDC memory, RC=-3.

    Posted Mon January 19, 2015 10:10 AM

    Hello,

    It is my understanding that this is being tracked my PMR is 13428,7TD,000.  Please let me know if that is not correct.

    We are trying to recreate the scenario  by setting up a machine with the memory size, swap space , kernerl paramaters, ulimit settings, the devmap definitons, and number of zPDT instances  as descripted above.

    Regards,

    Keithvb

     

     

    keithvb


  • 5.  Re: AWSECH004S Unable to define RAS/FEDC memory, RC=-3.

    Posted Wed January 21, 2015 09:16 AM

    Yes. You are Correct.

     

    -Sathya

    Acc_Rdz


  • 6.  Re: AWSECH004S Unable to define RAS/FEDC memory, RC=-3.

    Posted Wed January 21, 2015 11:19 AM

    Customer issue : 3 zPDT instances running at the same time with greater than 600 ckds per instance. Initialization failures occurs when starting up the third instance.

    The customers scenario was recreated using a virtual machine with 64g Real Memory, 14CPs, 32g swap space, with kernel parameters and ulimits as the customer provided. Three identical zPDT instances were created with a higher number of ckd disks than used by the customer.  First instance started fine. The second instance failed initialization, getting the same failure as seen by the customer.  (a number of "AWSECH004S Unable to define RAS/FEDC memory, RC=-3." followed by and interwoven with Device manager failures and Device manager restart failures .)   . Please note that the first instance remained up and working.

    Ulimit and kernel parameters were reviewed per our past experience .

    First experiment reduced the z memory of each instance  down to 8gigs from 12gigs. That had no obvious affect  - still failed. Next test raised the Real memory of the machine from 64gigs to 80gigs and then even to 100gigs . Again this had no obvious bearing , resulting in the same error.   

    At this point it was concluded that the issue was not the amount of shared memory but the number of shared virtual memory segments.  This was demonstrated by attempting to start the third instance where it failed immediately trying to get any shared memory .

    Now understanding what to look for, investigation found kernel setting : SHMMNI  ,  which sets the system wide maximum number of shared memory segments. It is defaulted to 4096.  

    SHMMNI was arbitrarily doubled, setting it to 8192 , by adding "kernel.shmmni = 8192" to /etc/sysctl.conf and made it active via command : sysctl -p.  Now all 3 instances started successfully with a very large number of ckd devices per instance.

    Ramifications of making this change on the total Linux system is unknown. Consult a  Linux Tuning expert  to learn how this affects your Linux system.

    Please let us know how your scenaro works with this setting.

    Regards,

    Keith

     

     

     

    keithvb


  • 7.  Re: AWSECH004S Unable to define RAS/FEDC memory, RC=-3.

    Posted Thu January 22, 2015 01:09 AM
    Thanks Keith.                                                       We have added the parameter in sysctl.conf and activated. We will check and update.                                                                        Regards,                                                                Sathya 
    Acc_Rdz


  • 8.  Re: AWSECH004S Unable to define RAS/FEDC memory, RC=-3.

    Posted Mon February 23, 2015 01:50 PM

    What Keith said should work.

    tyrellqob92