HMC

HMC & CMC

Connect, learn, share, and engage with IBM Power.

 View Only
Expand all | Collapse all

Stopped receiving ESA emails after HMC rebuild

Archive User

Archive UserMon August 29, 2016 02:10 PM

Archive User

Archive UserFri September 02, 2016 01:39 PM

Archive User

Archive UserWed September 14, 2016 12:34 PM

Archive User

Archive UserThu September 22, 2016 05:16 PM

Archive User

Archive UserFri September 23, 2016 05:06 PM

Archive User

Archive UserTue September 27, 2016 04:07 PM

Archive User

Archive UserWed September 28, 2016 05:34 PM

Archive User

Archive UserThu September 29, 2016 01:56 PM

  • 1.  Stopped receiving ESA emails after HMC rebuild

    Posted Mon August 29, 2016 02:10 PM

    Originally posted by: c01362


    Hello.

     

    Recently one of the internal drives on our IBM 7042 CR7 HMC V8R8.3.0 SP2 failed.

     

    As luck would have it,  the drives weren't configured as RAID 1 (they were RAID 0) so we had to restore from a recent bkconsdata image after the drive was replaced.

     

    The restore was successful and everything looks like it was before the disk failure.

     

    The exception is the "Schedule and Send Data" on the "Schedule Service Information" tab.

     

    Since the restore, we no longer receive emails indicating the "Performance Management Information" was successfully sent to IBM.

     

    I'm referring to the "Successfully transmitted performance management information." emails from the ESA on the HMC.

     

    If we manually run the "Performance Management Information" by using the "Send Now" button on the "Schedule Service Information" tab, we do get the emails.

     

    So why does it work manually but the automated nightly 01:00 AM run does not work?

     

    We've tried to remove and re-add the  "Performance Management Information"  checkbox on the "Schedule Service Information" tab but that did not fix it.

     

    We also confirmed the HMC call-home functions are working.

     

    Anyone have suggestions on how to fix the missing emails?

     

    Please advise.

     

    Thank you.

     

     

     

     



  • 2.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Tue August 30, 2016 05:12 PM

    Originally posted by: jslayton


    So there were a couple of known issues in this area that were introduced back in HMC V8R8.3.0, however both have fixes that were put in for HMC V8R8.3.0 SP2.  We are going to try and recreate this in our lab. If we are not able to running at the same level, we might need some logs to find out whats happening.



  • 3.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Tue August 30, 2016 05:40 PM

    Originally posted by: c01362


    Hello.

    Thanks for responding.

    Just some more information for you.

    Below is output of "lshmc -V" so you can see we have the latest PTF installed.

    Thanks again and please let me know if you need any additional information.

     

    lshmc -V
    "version= Version: 8
     Release: 8.3.0
     Service Pack: 2
    HMC Build level 20160807.1
    MH01647: Fix for HMC V8R8.3.0 SP2 (08-07-2016)
    ","base_version=V8R8.3.0



  • 4.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Fri September 02, 2016 11:49 AM

    Originally posted by: jslayton


    So we were unable to recreate this issue in our lab.  Is it possible for us to collect some logs from the HMC? Like the callhome#.log?



  • 5.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Fri September 02, 2016 11:58 AM

    Originally posted by: jslayton


    Specifically these logs would be beneficial after a failure eccTrace0.log, eccTrace1.log, eccTrace2.log, CallHome1.log, CallHome2.log, iqyylog.log.



  • 6.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Fri September 02, 2016 01:39 PM
      |   view attached

    Originally posted by: c01362


    Hello again.

     

    I'm not completely sure where those logs reside on our HMC.

     

    I've attached the console events log.

     

    Please provide explicit paths to any logs you need.

     

    Not sure you need to know this but our HMC internal drive failure occurred around August 12 and we did the restore from the bkconsdata image on August 17.

     

    Please let me know what else you need to troubleshoot.

     

    Thank you.

    Attachment(s)

    docx
    hmc1-console-event-log.docx   705 KB 1 version


  • 7.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Thu September 08, 2016 02:14 PM

    Originally posted by: jslayton


    Hi,

    Unfortunately the log did not have sufficient information to find any issue.  The logs we need would be:

    opt/ccfw/data/service/CallHome*,
    /opt/ccfw/data/service/ESA*,
    /var/hsc/log/ecc*
     
     
       
     /var/hsc/log/iqyylog.log  

     

    If you could get the logs from the listed locations after a recreate it would be a great help.  Thank you.



  • 8.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Thu September 08, 2016 03:32 PM
      |   view attached

    Originally posted by: c01362


    Hello again.

    I was only able to get the CallHome* and ecc* files.

    There were no /opt/ccfw/data/service/ESA* files and I kept getting a "Permission denied" message when I tried to scp the /var/hsc/log/iqyylog.log as hscroot.

    I uploaded the files I could get.

    Not sure what you mean by "...get the logs from the listed location after a recreate..."

    Please clarify.

    Also, if you're aware of a way to get /var/hsc/log/iqyylog.log and avoid the "Permission denied" message, please let me know.

    Thanks!

    Attachment(s)

    gz
    hmc1.tar.gz   2.17 MB 1 version


  • 9.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Mon September 12, 2016 12:02 PM

    Originally posted by: jslayton


    So we were able to recreate this failure in our lab.  We have opened an ticket to find the cause and a fix.  I will update with more info as it becomes available.



  • 10.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Tue September 13, 2016 01:09 PM

    Originally posted by: c01362


    Hello.

    Thanks for the info.

    We'll wait for your update.

    Thank again.



  • 11.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Wed September 14, 2016 11:26 AM

    Originally posted by: jslayton


    When we recreated this in the lab it was because the HMC was not the primary HMC managing the system, can you verify the HMC is the primary for the managed systems, and if not set it as the primary and verify the issue does not happen again?



  • 12.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Wed September 14, 2016 12:34 PM
      |   view attached

    Originally posted by: c01362


    Hello.

     

    Please excuse what is probably an ignorant question but how exactly do I verify which HMC is the primary for our managed systems?

     

    We have two HMC's - hostnames hmc1 and hmc2.

     

    Both are running the same version V8R8.3.0 SP2.

     

    I was under the impression that hmc1 was our primary.

     

    Hostname hmc1 was the HMC that experienced the internal disk failure and had to be restored, and where we stopped receiving the "Successfully transmitted performance management information" emails from.

     

    The attached screen shot shows the "Call-home server consoles" page for hmc2 and hmc1.

     

    Based on the "Use discovered call-home server consoles" check box on the hmc2 screen shot, I was assuming hmc1 was the "primary".

     

    Please confirm whether or not my assumption is correct and, if it isn't, how exactly do I determine which of the HMC's is the primary for our managed systems.

     

    Thanks!



  • 13.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Thu September 15, 2016 11:09 AM

    Originally posted by: jslayton


    Not an ignorant question at all, the HMC does not make it very intuitive to determine which HMC is primary or not. The call home server consoles unfortunately does not really give you much indication of which is primary or not.  The primary HMC is the one responsible for collecting information from the managed system and performing service.  The primary HMC does not have to be the HMC that actually notifies IBM.  There are two ways to determine if HMC1 is primary.  

    1. Try to preform a service action on a concurrent hot swap FRU like the HMC cable.  The HMC will prompt the user if they are not the primary HMC, with instructions on how to make it the primary HMC.

    2. Login as hscpe and check for the latest PA DOM PRM trace in the iqyylog.  Here is an example of the trace.

    Ex: 0B460000 09/08/2016 13:55:01.880[+0.0]   I    PA DOM PRM   domain=8284-22A/102B09V; primary=7042-CR8/21E3FDC

     

     



  • 14.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Thu September 15, 2016 03:53 PM

    Originally posted by: c01362


    Hello.

    Thanks for the info on how to tell which HMC is the primary.

    Very useful.

    We have confirmed that hmc1 is the primary.

    We found the following entry in /var/hsc/log/iqyylog.log on hmc1:

    0B460000 09/04/2016 12:00:33.370[+0.0]   I    PA DOM PRM   domain=8408-E8D/107C01T; primary=7042-CR7/069E08C

    Something else I wanted to mention.

    As noted, we stopped receiving the "Successfully transmitted performance management information" emails from hmc1 after the internal disk failure and restore.

    However, a few weeks ago the voltage regulator module (VRM) on one of our IBM 8408, model E8D frames failed.

    This caused the frame (and all LPARS hosted on it of course) to crash.

    After our local CE replaced the VRM and got the frame back up, we began receiving the "Successfully transmitted performance management information" emails for the LPAR's hosted on that frame from hmc1.

    I want to stress that the emails started flowing ONLY for the LPARS on the frame that was rebooted as a result of the VRM replacement - no other LPARS.

    So long story short, hmc1 is the primary but we're ONLY receiving emails from the four LPAR's on a frame that was rebooted.

    Not sure when we'll have the opportunity to reboot the other frames in our environment but I hope that isn't necessary to fix this issue.

    Please let me know if you need any other data or clarification on what I've listed here.

    Thanks again for all your help so far.



  • 15.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Thu September 15, 2016 04:06 PM
      |   view attached

    Originally posted by: c01362


    Hello again.

    In addition to my last post, I also wanted to provide you the /var/hsc/log/iqyylog.log from hmc1.

     

    Please see the attachment.

     

    Thanks.

     

    Attachment(s)

    txt
    HMC1 iqyylog.txt   714 KB 1 version


  • 16.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Thu September 15, 2016 04:57 PM

    Originally posted by: RoseBSundermeyer


    Hello

    For PM for Power data to be sent, there needs to be a configuration on the HMC as well as a configuration on the LPAR itself.  Since a reboot resolved the issue on one LPAR, I am wondering if the configuration on the LPAR itself somehow became corrupted.

     

    Please take a look at this link for the appropriate LPAR type (AIX / Linux) and try unconfiguring the function and then re-configuring it.

    http://www-03.ibm.com/systems/power/support/perfmgmt/getstarted.html 

     

    In the mean time, I will look at the ecc transmission logs (from the 8th).  This will allow us to determine if the failure is between the HMC and the LPAR (retrieving the logs to be transmitted) or between the HMC and IBM (failure to send the logs retrieved from the LPAR)




  • 17.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Thu September 15, 2016 05:25 PM

    Originally posted by: c01362


    Hello.

    As mentioned in my last post, the hmc1 emails started flowing from the four LPARS hosted on a frame after that frame was rebooted to address a hardware issue.

    Configuration of all the other LPAR's in our environment hosted on other frames has not changed.

    Also, if we manually run the "Schedule Service Information" -> "Performance Management Information" on hmc1, we get the "Performance Management Information" was successfully sent to IBM confirmation emails for all the LPAR's in our environment.

    Please let us know if you find anything interesting in the ecc logs.

    Thank you.



  • 18.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Thu September 15, 2016 07:07 PM

    Originally posted by: RoseBSundermeyer


    Hello

    I understand.  My suggestion was very basic.  Since the problem was cleared up by a reboot on one frame, there is the potential that turning the collection off and then back on on the frame(s) still in error might also resolve the problem.

     

    The ecc trace itself showed no errors, which raises the question of whether we have a problem with the transmission, or whether there is simply a problem with the code that notices the transmission is complete.  When investigating this error, have you confirmed that the data transmitted does not show up on the PM for Power website?  (I have no ability to look at your data on that website.).  If not, I can ask the PM for Power team to take a look.

     

    Looking at the ecc trace, I found where we transmitted a file 5 times on 9/8  (4 in the morning and once in the afternoon).  The morning timestamps do not appear to be what you are referencing (1 AM) - but the file names contain the serial number referred to earlier:  8408-E8D_107C01T_2.pm_stats.send_6526928474840820828.sent.1473055264224.file_0.  

     

    There does appear to be a difference  size on what we collect on the morning transmission (failing)  and what we collect in the afternoon transmission  (success) and I only see an acknowledgement of success on the afternoon transmission,  so the lack of completion email might be triggered by  the extra processsing needed to handle the larger file.  (Not sure how a frame failure or a reboot could be triggering that type of error though).

     

    I need to engage the ESA team to answer a few questions on

    1.  What they touch on the LPAR that would / could be changed by an IPL

    2.  What is different between a Send Now and a Periodic transmission (the periodic transmission is split into 2 separate files due to file size so apparently that file is larger than the 'Send now' transmission which is  not

     

     



  • 19.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Fri September 16, 2016 05:30 PM

    Originally posted by: jslayton


    Thank you for the additional information.  From what we have been able to piece together when the hmc1 failed its role as primary was terminated as one would expect.  When it was repaired it does not automatically become primary.  When the frame failed and was reboot that creates a re-arbitration and the hmc1 became the primary for that frame only and that is why you see the transmissions for those events.  The other systems still show hmc2 as the primary.  

    0B460000 09/04/2016 12:00:33.370[+0.0]   I    PA DOM PRM   domain=8408-E8D/107C01T; primary=7042-CR7/069E08C
    ...
    0B460000 09/04/2016 11:40:28.230[+0.0]   I    PA DOM PRM   domain=8408-E8D/107C01T; primary=null
    ...
    0B460000 08/24/2016 19:07:00.220[+0.0]   I    PA DOM PRM   domain=8231-E1D/069CA0T; primary=7042-CR7/06AD8BC,/10.113.0....
    0B460000 08/24/2016 19:06:59.830[+0.0]   I    PA DOM PRM   domain=8408-E8D/107CB5T; primary=7042-CR7/06AD8BC,/10.113.0....
    0B460000 08/24/2016 19:06:59.470[+0.0]   I    PA DOM PRM   domain=8286-42A/218C9EV; primary=7042-CR7/06AD8BC,/10.113.0....
    0B460000 08/24/2016 19:06:59.080[+0.0]   I    PA DOM PRM   domain=8202-E4C/063521T; primary=7042-CR7/06AD8BC,/10.113.0....
    0B460000 08/24/2016 19:06:58.730[+0.0]   I    PA DOM PRM   domain=8231-E1C/063491T; primary=7042-CR7/06AD8BC,/10.113.0....
    0B460000 08/24/2016 19:06:58.540[+0.0]   I    PA DOM PRM   domain=8408-E8D/107C02T; primary=7042-CR7/06AD8BC,/10.113.0....
    0B460000 08/24/2016 19:06:58.170[+0.0]   I    PA DOM PRM   domain=8286-42A/218C9FV; primary=7042-CR7/06AD8BC,/10.113.0....
    0B460000 08/24/2016 19:06:57.990[+0.0]   I    PA DOM PRM   domain=8247-21L/212208A; primary=7042-CR7/06AD8BC,/10.113.0....
    0B460000 08/24/2016 19:06:57.810[+0.0]   I    PA DOM PRM   domain=8286-42A/218CA2V; primary=7042-CR7/06AD8BC,/10.113.0....
    0B460000 08/24/2016 19:06:57.430[+0.0]   I    PA DOM PRM   domain=8231-E1C/06348FT; primary=7042-CR7/06AD8BC,/10.113.0....
    0B460000 08/24/2016 19:06:57.240[+0.0]   I    PA DOM PRM   domain=8286-42A/218CA1V; primary=7042-CR7/06AD8BC,/10.113.0....
    0B460000 08/24/2016 19:06:57.030[+0.0]   I    PA DOM PRM   domain=8286-42A/218C9DV; primary=7042-CR7/06AD8BC,/10.113.0....
    0B460000 08/24/2016 19:06:56.670[+0.0]   I    PA DOM PRM   domain=8231-E1C/063490T; primary=7042-CR7/06AD8BC,/10.113.0....
    0B460000 08/24/2016 19:06:56.290[+0.0]   I    PA DOM PRM   domain=8408-E8D/107C01T; primary=7042-CR7/06AD8BC,/10.113.0....
    0B460000 08/24/2016 19:06:55.740[+0.0]   I    PA DOM PRM   domain=8286-42A/218CA0V; primary=7042-CR7/06AD8BC,/10.113.0....
    0B460000 08/24/2016 19:06:55.540[+0.0]   I    PA DOM PRM   domain=8202-E4D/06CA7ET; primary=7042-CR7/06AD8BC,/10.113.0....
    0B460000 08/24/2016 19:06:55.330[+0.0]   I    PA DOM PRM   domain=8231-E1C/06348ET; primary=7042-CR7/06AD8BC,/10.113.0....
    0B460000 08/24/2016 19:06:55.110[+0.0]   I    PA DOM PRM   domain=8205-E6D/067C00T; primary=7042-CR7/06AD8BC,/10.113.0....



  • 20.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Mon September 19, 2016 02:50 PM
      |   view attached

    Originally posted by: c01362


    Hello Rose and Jerry.

    I really do appreciate all the feedback you've been giving me.

    We did reconfigure ESA on the only two AIX LPAR's hosted on a particular frame in our environment.

    That frame is a 8205-E6D-067C00T if that matters.

    We'll see if that makes any difference as far as the missing "Successfully transmitted performance management information." emails are concerned.

    Regarding the data reviewed from September 8, we transmitted some log files to IBM from one of our frames on that day, 8408-E8D-107C02T.

    That was unrelated to this whole missing emails issue.

    I did manually run the "Performance Management Information" again today by using the "Send Now" button on the "Schedule Service Information" tab of both hmc1 and hmc2.

    For the record, hmc1 is configured to send "Performance Management Information" at 01:01:01 AM daily and hmc2 is configured for 05:58:54 AM daily.

    By daily I mean the "Interval (days)" drop down is set to 1.

    We did receive ALL the "Successfully transmitted performance management information." LPAR emails as a result of that manual run.

    Also, the "Last Data Recv'd" date on the "Performance Management for Power Systems" site did update and is current, 2016-09-19.

    We did notice on both hmc1 and hmc2 that a few frames had the "Performance Monitoring Data Collection for Managed Servers" turned off.

    We have turned it on, "All On", for all the frames on both HMC's.

    Not sure if that will have any impact on the missing emails.

    Not sure if you need it but I'll upload the latest /var/hsc/log/iqyylog.log from hmc1 so you have updated information.

    Jerry, you mentioned hmc2 still looks to be the primary for the majority of our frames and hmc1 is only the primary for the frame that crashed earlier this month.

    Is that a problem?

    If it is, please let me know what steps should be taken to address it.

    Please let me know if any other information is needed for your troubleshooting.

    As mentioned, I'm curios to see if either reconfiguring the ESA on those two LPAR's or turning on the performance monitoring data collection for all frames will make any difference as far as the missing emails are concerned.

    Thanks.

     

    Attachment(s)



  • 21.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Tue September 20, 2016 12:45 PM

    Originally posted by: c01362


    Hello again Rose and Jerry.

    Just wanted to let you know that the changes we did yesterday did not make any difference.

    Again those changes consisted of reconfiguring ESA on the two AIX LPAR's hosted on frame 8205-E6D-067C00T and turning "All On" via the HMC's "Performance Monitoring Data Collection for Managed Servers" setting for all frames.

    So we're still only getting the "Successfully transmitted performance management information." emails for the four LPAR's hosted on the frame that crashed/rebooted earlier this month, frame 8408-E8D-107C01T.

    As for the "Performance Management for Power Systems" site, the data for all the AIX LPAR's is updating.

    The "Last Data Recv'd" column has the current date 2016-09-20.

    The VIOS LPAR's however are NOT updating - with the exception of the two hosted on the frame that crashed/rebooted earlier this month - those two are updating.

    All others still have the "Last Data Recv'd" date of 2016-09-19 which is when I manually ran "Performance Management Information" via the "Send Now" button on the "Schedule Service Information" tab of both hmc1 and hmc2.

    My coworker indicated that in our environment all AIX LPAR's have ESA locally configured.

    The VIOS LPAR's however are relying on ESA from the HMC.  

    So ESA on the HMC's doesn't appear to be sending the data, only the locally configured AIX LPAR ESA's are updating the "Performance Management for Power Systems" site data.

    Please review this and my previous posting and let me know if you have any additional suggestions.

    Thanks. 



  • 22.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Tue September 20, 2016 04:54 PM

    Originally posted by: mariomds


    Hello,

    I took a look at the iqyylog and it looks like Jerry was right. We were able to recreate the issue and see this in the logs (from ESALogger so I couldn't verify it with the logs provided here before):

       2379 2016-09-13 18:26:01.101 UTC INFO: 22416: ESAPMServiceInfoImpl.collectPMData: [ESAPMServiceInfoImpl] Skipping CEC:8408-E8E-106788V because not primary or not scheduled

       2380 2016-09-13 18:26:01.101 UTC INFO: 22416: ESAPMServiceInfoImpl.collectPMData: [ESAPMServiceInfoImpl] PM data collection skipped,  hence Purge and Archive process skipped. Please look at the above logs for reasons

     

    The data collection itself won't be started unless the HMC is the primary for the system, so emails will never get sent out. A simple way to test this is to force one of the systems to set hmc1 as the primary and see if emails start flowing for that one system again. To do this:

    1. Disconnect the system from the current primary so that re-arbitration triggers a fail over to the other HMC.

    2. Verify that the system has set hmc1 as the primary by looking at the iqyylog and seeing something similar to "PA DOM PRM   domain=8408-E8D/107C01T; primary=7042-CR7/069E08C." This line must be the latest PA DOM PRM entry for that system.

    3. Schedule the event and wait for an email.

    4. Reconnect the system to hmc2 and resume as usual. Reconnecting should not trigger re-arbitration unless another issue occurs, so it might be a good idea to double check that hmc1 is still the primary.

    If disconnecting from hmc2 is an issue you could also force hmc1 to become the primary by starting any FRU exchange. If this is done while hmc1 is not the primary, you'll see a panel asking if you would like to force it to become the primary. Once it does, you can delay the procedure and proceed with verifying that the emails are flowing again.

    Let me know if anything is unclear or if there are any more questions/issues that come up.



  • 23.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Tue September 20, 2016 05:18 PM

    Originally posted by: c01362


    Hello Mario.

    What you're saying makes sense but I'm unclear on how to perform step 1.

    1. Disconnect the system from the current primary so that re-arbitration triggers a fail over to the other HMC.

    Could you please provide details on how to accomplish this for a particular AIX LPAR?

    Thanks Mario.



  • 24.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Tue September 20, 2016 05:37 PM

    Originally posted by: mariomds


    You wouldn't disconnect from an LPAR, you would disconnect the entire system from the HMC by going to System Management > Servers > select 107C01T (or any machine) and from the menu go to Connections > Reset or Remove Connections > Remove Connections. This has to be done because the primary is set for the servers, not for each of the LPARS.



  • 25.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Tue September 20, 2016 06:00 PM

    Originally posted by: c01362


    Hello again Mario.

    Thanks for the explanation.

    We removed the connections for one our our frames hosting non-production AIX LPAR's.

    That frame is 8205-E6D-067C00T.

    Here's the entry written to iqyylog

    Eventid  Date       Time                 Type Name         Data
    0B460000 09/20/2016 21:44:25.980[+0.0]   I    PA DOM PRM   domain=8205-E6D/067C00T; primary=7042-CR7/069E08C

    We're now going to wait and see if hmc1, which is configured to send "Performance Management Information" at 01:01:01 AM daily, generates the "Successfully transmitted performance management information." emails for the LPAR's hosted on that frame.

    Another question for you, you mentioned in step 4:

    4. Reconnect the system to hmc2 and resume as usual. Reconnecting should not trigger re-arbitration unless another issue occurs, so it might be a good idea to double check that hmc1 is still the primary.

    Is checking iqyylog the only way to determine if an HMC is the primary for a frame?

    Is there a way to do that from the command line of the HMC?

    Just wondering.

    I will let you know the results of this connection change tomorrow.

    Thanks Mario.



  • 26.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Tue September 20, 2016 06:31 PM

    Originally posted by: mariomds


    As far as I know, there's no practical way to tell what the primary is set to besides looking at logs. I'll ask around a bit more but I'm pretty confident that there is no other way, based on the responses I've gotten from asking before.

    I'll let you know as soon as I find otherwise though.



  • 27.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Wed September 21, 2016 01:08 PM

    Originally posted by: c01362


    Hello Mario.

    We have received the "Successfully transmitted performance management information." emails for the four LPAR's hosted on frame 8205-E6D-067C00T so removing the connection from

    hmc2 so hmc1 would become the primary worked.

    Also, data on the "Performance Management for Power Systems" site is current for those four LPAR's.

    8205.E6D.067C00T-1
    8205.E6D.067C00T-2
    8205.E6D.067C00T-3
    8205.E6D.067C00T-4

    I'm sorry but I have a few more questions for you on the best way to proceed.

    We still have about 15 frames where the hmc2 connection will have to be removed so hmc1 becomes the primary.

    What is the best way to do this?

    Select all the frames on hmc2 and then choose Connections > Reset or Remove Connections > Remove Connections or should we do it one frame at a time?

    Also, you suggested "Reconnect the system to hmc2 and resume as usual..." after we confirmed receiving the emails.

    How exactly is that done?

    I tried using the Servers -> Connections -> Add Managed System option on hmc2 to re-add frame 8205-E6D-067C00T but that only seems to add an LPAR and not a frame.

    Please provide details on those two items and if you find any other info on identifying the primary HMC.

    Thanks Mario.



  • 28.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Wed September 21, 2016 04:31 PM

    Originally posted by: mariomds


    I'm glad it worked. There should be no problem with selecting all of the systems and then removing the connections, I think the HMC will still ask you to do it one by one so it won't be much different anyway.

    I still haven't found another way to determine the primary

    I'm not sure how it would be possible to add only an LPAR instead of a system. Once the system is added you will see the partitions though, so maybe the server got selected and it's only showing the LPARs on it? Whatever the case, If you're able to see the partitions then the system they belong to should be managed by the HMC. You could try backing out to System Management > Servers and looking at the list there.

    You could take a screenshot of the Systems Management > Servers panel if you want to let us know exactly what's happening in the HMC, since I don't think I've ever heard of adding only an LPAR and wouldn't expect that to be possible.

    Sorry if I'm not being clear enough, but do let me know if I can clarify anything else.



  • 29.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Wed September 21, 2016 06:27 PM
      |   view attached

    Originally posted by: c01362


    Hi Mario.

    Thanks for all the information you've provided so far.

    Regarding the question about reconnecting frame 8205-E6D-067C00T (system name lawsontest) to hmc2, I attached a screen shot that shows the system listing of hmc1 and hmc2.

    The frame 8205-E6D-067C00T (system name lawsontest) vanished from the system listing on hmc2 yesterday when we removed the connection via Connections > Reset or Remove Connections > Remove Connections.

    It still appears in the system listing on hmc1 as you can see in the attached screen shot.

    When we try to re-add lawsontest to the system listing on hmc2 using Servers > Connections > Add Managed System, we get a "IP Address/Host name:The IP address or Host name is invalid." message.

    To clarify, in our environment, some frames host multiple LPARS.

    In those cases, the frame itself is not in DNS and doesn't have an IP address.

    The multiple LPAR's it hosts are in DNS and do have IP addresses.

    Other frames in our environment consist of a single LPAR.

    Those frames are in DNS and do have an IP address.

    With lawsontest, since it hosts multiple LPAR's, that system name is not in DNS and doesn't have an IP address - just the LPAR's it hosts are in DNS and have addresses.

    So, the question is, how do we re-add lawsontest back to the hmc2 system list given it doesn't have an IP address and isn't in DNS?

    I hope this all makes sense.

    Please respond when you get a chance.

    Thanks Mario.

     

    Attachment(s)



  • 30.  RE: Re: Stopped receiving ESA emails after HMC rebuild

    Posted 27 days ago

    Goodday

    I have the sme issue around the ESA data not being transmitted form one of our HMC's but rather curious to know how you managed to allow the two HMC's to see the same Managed Systems?

    It is something we have been trying to do.

    We have the Networking setup as per IBM best practice for Dual HMC's and it should be working with DHCP, however, no luck thus far.



    ------------------------------
    Regards

    Anwar Williams
    ------------------------------



  • 31.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Thu September 22, 2016 10:17 AM

    Originally posted by: mariomds


    I see, in that case I'm wondering how the system is connected to hmc1 if it doesn't have an IP address. Could you run this command on hmc1 and find the line for 067C00T?

    lssysconn -r all

    That might give us more information.



  • 32.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Thu September 22, 2016 05:16 PM
      |   view attached

    Originally posted by: c01362


    Hello Mario.

     

    The attachment contains the "lssysconn -r all" output from both hmc1 and hmc2.

     

    I apologize but I should have discussed this issue with my co-worker who built our HMC's before bothering you with it.

     

    In our environment, hmc1 is the HMC at our main data center and hmc2 is the HMC at our DR data center.

     

    They are on different VLAN's (which is why you see different IP addresses in the lssysconn output) and each frame can communicate with both VLAN's and both HMC's.

     

    Once I entered the correct 172.17.0.10 address on hmc2 using Servers > Connections > Add Managed System, it connected to the frame 8205-E6D-067C00T (system name lawsontest), and it is now back in the hmc2 server listing.

     

    My co-worker indicated that the HMC's should be communicating with each other and should know which one is the primary.

     

    If this is true, he didn't quite understand why it would be necessary to remove the connection to a frame from hmc2 in order for hmc1 to become the primary for that frame and get the "Successfully transmitted performance management information." emails flowing again.

     

    He figured they would flow from whichever HMC was the primary for a frame.

     

    He thought perhaps that the HMC's weren't fully communicating with one another because the "Use discovered call-home server consoles" option wasn't enabled on hmc1 as shown in the "Manage Outbound Connectivity" page screen shots at the end of the attached document.

     

    We have now enabled that option on hmc1 to see if it makes any difference as far as the flow of the "Successfully transmitted performance management information." emails is concerned.

     

    That is, if we'll receive those emails from whichever HMC is the primary for a particular frame.

     

    We're going to wait and see if that change makes any difference.

     

    If it doesn't, we'll remove the frame connections from hmc2 as we did with frame 8205-E6D-067C00T (system name lawsontest).

     

    Either way, I'll let you know the outcome.

     

    Please comment on my co-workers question about the HMC's communicating with one another and each one knowing which is the primary.

     

    Thanks Mario.

    Attachment(s)

    docx
    hmc lssysconn output.docx   151 KB 1 version


  • 33.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Thu September 22, 2016 05:56 PM

    Originally posted by: mariomds


    He is correct on all counts. The HMCs will communicate with each other and they do know what the primary is for each specific system. The assumption here was that only hmc1 was configured for Call Home and Customer Notification, and there was no desire to have hmc2 act as a call home server as well. If hmc2 is also configured to do the calls in the same way that hmc1 is, then there should be no problem with just enabling the Discover Call Home server option on hmc1. Like he said, If both are allowed to Call Home then primary status becomes irrelevant as the emails will just be sent from whichever HMC is the primary. Make sure hmc2 has all the Call Home information required though, in case hmc2 was never set up to do the calls. You can do that by using the test button on Outbound Connectivity or by creating a test problem.

    Sorry, I should have asked if enabling hmc2 as a Call Home server was something that you wouldn't mind doing, instead of assuming it was out of the question.



  • 34.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Fri September 23, 2016 05:06 PM

    Originally posted by: c01362


    Hello Mario.

     

    My coworker confirmed that both hmc1 and hmc2 are, and apparently always have been, configured for call home and customer notification.

     

    With that in mind, why isn't performance data for the frames hmc2 is primary for being uploaded to the "Performance Management for Power Systems" site and why aren't we receiving daily "Successfully transmitted performance management information." emails from hmc2 for the frames it is the primary for?

     

    I believe I mentioned this before but all AIX LPAR's in our environment have ESA locally configured.

     

    The VIOS LPAR's however are relying on ESA from the HMC. 

     

    So ESA on hmc2 doesn't appear to be sending the data, only the locally configured AIX LPAR ESA's are updating the "Performance Management for Power Systems" site data.

     

    ESA on hmc1, on the other hand, is updating the AIX and VIOS LPAR's for which it is primary.

     

    By the way, enabling the "Use discovered call-home server consoles" option on hmc1 didn't make any difference.

     

    That is, we're still only receiving the "Successfully transmitted performance management information." emails from the frames that hmc1 is the primary for and nothing from the frames hmc2 is the primary for.

     

    So in a nut shell, something is not right on hmc2 but we're not sure what the issue is.

     

    We'd like to get a handle on that instead of simply making hmc1 the primary for all frames to resolve.

     

    Please let us know your thoughts on this when you get a chance.

     

    Thanks Mario.



  • 35.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Fri September 23, 2016 05:11 PM

    Originally posted by: c01362


    Mario, regarding my last post, I misspoke when I said "ESA on hmc1, on the other hand, is updating the AIX and VIOS LPAR's for which it is primary."

    As I mentioned, the VIOS LPAR's are relying on ESA from the HMC.

    So what I meant to say is ESA on hmc1 is updating the VIOS LPAR's for which it is primary.

    The AIX LPAR's are using the locally configured ESA.

    Just wanted to avoid confusing things any worse than I may have already.

    Please review both posts and respond when you can.

    Thanks.

     



  • 36.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Tue September 27, 2016 10:55 AM

    Originally posted by: mariomds


    I would need the iqyylog/CallHome/ESA/ECC logs to verify what exactly is happening to the Call Home requests, since it could be failing for a completely different reason now. Are you receiving an email about transmission failure or is it still missing?

    Also I just noticed you said hmc1 was the one that was marked with "allow discovered call home servers". The issue wasn't that hmc1 was failing to call home through hmc2, that will never happen as there's no failover function. In other words only one Call will be attempted, and since hmc1 is configured to do local call home, it will only try to do it through itself even if there are other HMCs in the list.

    The real issue is that hmc2 isn't doing the same (doing a Call for the systems it's primary for), so what you should be looking at is hmc2's Call Home settings. Is hmc2 allowed to do local Call Home? If it is and the performance collection email is still missing, we should take a look at the logs for hmc2 and see why its Call is failing.

    Basically, make sure both HMCs are configured for 'local call home' by going to the outbound connectivity panel and looking at the entries in the list.

    Then check to see if both are using the same email configuration.

    If they are, and both HMCs have no problem doing a Connection Test and a Test Problem Creation, then we should look at the logs from hmc2 to check for any exceptions.

    Just to rule out a couple of things, what happens when you do a manual transmission from hmc2? Does it behave the same as hmc1? We should expect that emails are sent correctly, and data transmission succeeds.



  • 37.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Tue September 27, 2016 04:07 PM

    Originally posted by: c01362


    Hello Mario.

     

    Please see my responses to your questions/comments below.

     

    Thanks.

     

    I would need the iqyylog/CallHome/ESA/ECC logs to verify what exactly is happening to the Call Home requests, since it could be failing for a completely different reason now. Are you receiving an email about transmission failure or is it still missing?

    <Tom Wolf> The "Successfully transmitted performance management information." emails are not being received from hmc2 as part of the daily 05:58:54 AM run. We do not receive a failure email either from hmc2. From hmc1, we do receive those emails from the two frames hmc1 is the primary for. For the record, hmc1 is configured to send "Performance Management Information" at 01:01:01 AM daily and hmc2 is configured for 05:58:54 AM daily. I've attached the iqyylog, CallHome, and ecc files I could find from both HMC's. I could not locate any /opt/ccfw/data/service/ESA* files on either HMC.

     

    The real issue is that hmc2 isn't doing the same (doing a Call for the systems it's primary for), so what you should be looking at is hmc2's Call Home settings. Is hmc2 allowed to do local Call Home? If it is and the performance collection email is still missing, we should take a look at the logs for hmc2 and see why its Call is failing.

    Basically, make sure both HMCs are configured for 'local call home' by going to the outbound connectivity panel and looking at the entries in the list.

    Then check to see if both are using the same email configuration.

    If they are, and both HMCs have no problem doing a Connection Test and a Test Problem Creation, then we should look at the logs from hmc2 to check for any exceptions.

    <Tom Wolf> See attachment for outbound connection settings for hmc2 and 1 as well as the results of a connection test and test problem creation. Both of those worked although the email received from hmc2 differed from hmc1. Not sure if that matters.

     

    Just to rule out a couple of things, what happens when you do a manual transmission from hmc2? Does it behave the same as hmc1? We should expect that emails are sent correctly, and data transmission succeeds.

    <Tom Wolf> Yes, we just manually ran the "Performance Management Information" by using the "Send Now" button on the "Schedule Service Information" tab of both hmc1 and hmc2. We got all the expected "Successfully transmitted performance management information." emails from both HMC's. So it works when executed manually but not as part of the automated scheduled run.

     

     

     



  • 38.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Tue September 27, 2016 05:29 PM

    Originally posted by: mariomds


    Sorry, the Call Home logs had already wrapped off by the time they were captured, due to the amount of systems connected. We'll need to capture them as soon as the scheduled time is reached (or within 3-5 min to allow for the operation to finish). If you can, it might be better to just change the time only for the Performance Management Information operation to something within 5 minutes, that way you won't need to wait for the entire day to finish. Doing so would also help to get rid of some of the noise from the other operations, making the logs easier to follow and less likely to wrap.

    The reason the emails for the problems are different is because FTP is enabled only on hmc1, hmc2 seems to be using ECC. I checked on this on our lab machines and it doesn't seem to impact the scheduled operations, so it's probably nothing to worry about.

    To get the iqyylog.log you'll need to have root access, then just copy it over as usual. There's more information in each of the events, so I can't see everything if it's just text.

    I guess the ESALogger isn't there because it wasn't added to 830, just >840.



  • 39.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Tue September 27, 2016 06:31 PM

    Originally posted by: c01362


    Mario, I may have mentioned this before but I can't scp /var/hsc/log/iqyylog.log from the HMC's even as hscroot.

     

    I keep getting a "Permission denied" error as shown below.

     

    The permission bits and ownership don't seem to allow scp using hscroot.

     

    Please let me know if you're aware of another way to get that file.

     

    What I've been sending you in place of that log is the Service Management > View Management Console Logs but it doesn't sound like it has the details you need.

     

    I also temporarily changed the runtime of the "Performance Management Information" on hmc1 and hmc2 so we could get current /opt/ccfw/data/service/CallHome* data.

     

    I'll upload that to this post.

     

    What's really odd is we got all the expected "Successfully transmitted performance management information." emails from both HMC's when I temporarily changed the runtime. I've now reset them to their original runtimes. That is, hmc1 is configured to send "Performance Management Information" at 01:01:01 AM daily and hmc2 is configured for 05:58:54 AM daily.

     

    Why did it work with a 5:01 PM runtime on hmc1 and 5:10 PM runtime on hmc2 but it doesn't work with our early morning runtimes?

     

    Also, please let me know if you have any thoughts on how to get the iqyylog.log.

     

    Thanks.

     

     

    # scp hscroot@hmc1:/var/hsc/log/iqyylog.log /home/c01362/hmc1

    scp: /var/hsc/log/iqyylog.log: Permission denied

     

    hscroot@hmc1:~> ls -alt /var/hsc/log/iqyylog.log

    -rw------- 1 ccfw ccfw 28650260 Sep 27 17:01 /var/hsc/log/iqyylog.log

     

    # scp hscroot@hmc2:/var/hsc/log/iqyylog.log /home/c01362/hmc2

    scp: /var/hsc/log/iqyylog.log: Permission denied

     

    hscroot@hmc2:~> ls -alt /var/hsc/log/iqyylog.log

    -rw------- 1 ccfw ccfw 25949707 Sep 27 17:10 /var/hsc/log/iqyylog.log

    Attachment(s)

    gz
    hmc1-callhome-files.tar.gz   115 KB 1 version
    gz
    hmc2-callhome-files.tar.gz   143 KB 1 version


  • 40.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Tue September 27, 2016 06:54 PM

    Originally posted by: mariomds


    Sorry, I should have clarified, I didn't mean hscroot I meant the actual root user. I'm not sure if you have access to it, so it might not be possible.

    Another method to get the logs is to use the Send Problem Reports tab in the Transmit Service Information panel. You can select whatever files you need and a zip will end up on the HMC's HW directory in Ecurep.

    I can't really explain why only the latest transmissions succeeded, since I was never able to confirm exactly why hmc2 failed to transmit earlier. We'll have to wait until it fails again and look at the logs. All I can think of is that maybe running one of the tests allowed the HMC to download one of the required ECC files for Call Home. It's not very likely though, since I don't think they update those very often. Otherwise, maybe one of the old call home settings was cached somewhere, and it only updated recently. Or there could be other connection issues affecting only hmc2. These are just guesses, it could be anything.



  • 41.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Wed September 28, 2016 05:34 PM

    Originally posted by: c01362


    Hello Mario.

     

    We have some good news.

     

    After removing and re-adding the "Performance Management Information" send times yesterday, we began receiving the "Successfully transmitted performance management information." emails from hmc2 shortly after its 05:58:54 AM send time.

     

    Perhaps it is as you said and some settings were cached.

     

    Regardless, we're going to see if that was a fluke or if it works again on Thursday morning.

     

    I have an automated job set up to collect the /opt/ccfw/data/service/CallHome* files shortly after the Thursday morning send so I'll have something for you to analyze if it doesn't work.

    Also, we don't have "root" access on the HMC's so I used Transmit Service Information > Transmit Service Data tab > selected the Hardware management console log (iqyylog.log) checkbox and selected the Send button to get the /var/hsc/log/iqyylog.log from each HMC.

     

    I sent you two emails with the zipped logs attached.

     

    Not sure if you got them considering the zip files are 15 MB in size.

     

    Below are the emails generated by running the transmit service data.

     

    Not sure if you have a way to access the files on your end if you didn't receive those emails I sent.

     

    Anyway, I will let you know tomorrow whether or not "Performance Management Information" send worked two days in a row on hmc2.

     

    Just to clarify, it had been working on hmc1 and continues to, so we don't have a problem there.

     

    It was hmc2 where things weren't flowing.

     

    I'll keep you posted.

     

    Again, please let me know if you didn't receive the iqyylog files.

     

    Thanks Mario.

     

     


     

    Successfully transmitted problem information.

     

    EED: /opt/ccfw/data/p/sa/20160928152123_TSD.zip

     

    Details: Successfully transmitted problem information: DATA SUBMITTED SUCCESSFULLY  ReturnCode: 0

     

     

     

     

     

     



  • 42.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Thu September 29, 2016 01:56 PM

    Originally posted by: c01362


    Hello Mario.

     

    Just wanted to let you know that for two days in a row now we've received the "Successfully transmitted performance management information." emails from hmc2 shortly after its 05:58:54 AM send time.

     

    We're also receiving those emails from hmc1.

     

    In addition, the "Last Data Recv'd" date column on the "Performance Management for Power Systems" site is current, 2016-09-29, for all our LPAR's.

     

    It seems removing and re-adding the "Performance Management Information" send times on hmc2 was the fix.

     

    A bit odd but I'm just glad it's functioning again.

     

    If you have any additional insight on this issue, please let me know.

     

    Otherwise, I think we can close this forum posting.

     

    Thanks Mario.



  • 43.  Re: Stopped receiving ESA emails after HMC rebuild

    Posted Fri September 30, 2016 10:26 AM

    Originally posted by: mariomds


    No problem, let us know if any other issues come up.