Maximo

Maximo

Come for answers, stay for best practices. All we're missing is you.

 View Only
Expand all | Collapse all

Reliability issues with escalations

  • 1.  Reliability issues with escalations

    Posted Mon November 15, 2021 09:32 AM
    Edited by System Admin Wed March 22, 2023 11:49 AM
    MAM 7.6.1.2:

    I've had a lot of reliability issues with escalations I've made. Sometimes they run, other times they don't run (ESCALATION.LASTRUN remains at date in the past...the escalation should have run a few times between then and now.).

    It seems pretty random to me. And is worse when I edit an existing escalation.

    I've gotten the impression that some people don't like or trust escalations, and prefer to use custom cron tasks instead.
    Is the reasoning behind that the reliability issues I mentioned?

    Has anyone had any similar experiences?
    #AssetandFacilitiesManagement
    #Maximo


  • 2.  RE: Reliability issues with escalations

    Posted Mon November 15, 2021 08:57 PM
    Unfortunately, I don't yet have a way to check the logs, so I'm running blind at the moment.


  • 3.  RE: Reliability issues with escalations

    Posted Mon November 15, 2021 10:16 PM
    Never had problems with Escalations, actually.  

    Note that each escalation is an instance of the ESCALATION cron task ... That might be a good troubleshooting point.  

    I did have a similar thing happen with the WOGEN cron task on one server.  It was scheduled to run at 8:45 pm, but ran at random times instead.  We moved Maximo onto a different server, and the problem disappeared. 

    Now the cron task is running on schedule, but the WOGEN email is being sent at random times - as much as 24 hours later.  I haven't found time to troubleshoot it yet, so I have no idea why.







  • 4.  RE: Reliability issues with escalations

    Posted Tue November 16, 2021 09:57 AM
    Hi Shannon,

    For what it's worth, I had some funky thing​s happening with pmwogen too (Back in May):

    Logic behind scheduled PMs?


  • 5.  RE: Reliability issues with escalations

    Posted Tue November 16, 2021 12:00 PM
    Huh.  Did you track down the source is the problem?  If so, what was it?





  • 6.  RE: Reliability issues with escalations

    Posted Wed November 17, 2021 08:12 AM
    Edited by System Admin Wed March 22, 2023 11:45 AM

    Hi Shannon,

    I didn't dive into it. It seemed like it would take too much time to figure it all out.

    For what it’s worth, I submitted this RFE: 
    If pmwogen gets interrupted, child WOs fail to generate (but we have no way of knowing)
    https://ibm-ai-apps.ideas.ibm.com/ideas/MASM-I-642


    Also, if you search for “PM” in the Ideas Portal, lots of interesting ideas pop up. Worth a read.




  • 7.  RE: Reliability issues with escalations

    Posted Wed November 17, 2021 09:59 AM
    Edited by System Admin Wed March 22, 2023 11:52 AM
    Dear 1971,

    I have been supporting Escalations since they appeared, probably in Maximo 4 (4.1.1?) and have not heard of reliability problems.   Yes, sometimes they don't run or stop running but as far as I remember the logs told the tale.

    As Shannon and Steve have said the Escalations set up in the Escalations application become instances of the Escalation Cron Task.

    To turn on logging you use the CRONTASK Root Logger to set up a Logger for ESCALATION in the Logger section in the bottom part of the window.    Use New Row if one is not there already.    Set it to DEBUG and make sure Active is checked.

    I would also set Root Loggers 'sql' to INFO and 'autoscript' to DEBUG (in case an Automation Script is somehow the cause).

    Make sure all the loggers you are using have Active checked.   Save and Apply Settings.

    If it fails again check the logs.

    If the Escalations run but do not process all the records you expected them to process perhaps the setting of Repeat in the Escalation Point is the cause.    With it unchecked, a record that has been processed by the Escalation will not be processed again.    With it checked records that have been processed by the Escalation can be processed again if they still meet the criteria of the Escalation's conditions.

    cheers                              ..................dick

    ------------------------------
    Dick Chertow
    L2 Maximo Support
    IBM
    Littleton MA
    ------------------------------



  • 8.  RE: Reliability issues with escalations

    Posted Wed November 17, 2021 10:48 AM
    Actually, I forgot about one scenario where I did have problems with an Escalation, and that's the scenario where the Escalation is running too frequently.  

    At one client, it was running every minute, and couldn't retrieve the data set and process it before the next instance started.  That caused a complete mess.

    Once the Escalation interval was increased, it was fine.







  • 9.  RE: Reliability issues with escalations

    Posted Sat November 20, 2021 09:25 PM

    I think my issues were twofold:

    1) As Shannon mentioned, there were escalations that were running too frequently (every minute). Maximo couldn't handle it.
    2) What Steven mentioned:
    "I'm assuming these are calling automation script actions because core actions (such as set value) tend to be extremely reliable. My guess is either these are taking longer than you expect (a cron task won't start until the previous execution completes) or getting an unhandled exception that prevents it from completing properly."

    I've remedied those two things. Escalations seem to be working a bit better now.

    Thanks everyone.




  • 10.  RE: Reliability issues with escalations

    Posted Wed November 17, 2021 10:41 AM
    Edited by System Admin Wed March 22, 2023 11:51 AM
    Here's an On Demand autoscript (aka Library script that you can hit the Execute button on) that will print the WebSphere logs. Use at your own risk and expense.

    """
    #==============================================================================
    AUTOSCRIPT: READWASLOGS
    DESCRIPTION: Read logs/<mxe.name>/SystemOut.log and print() some
    #==============================================================================
    """
    
    from java.io import BufferedReader
    from java.io import FileReader
    from java.io import File
    from psdi.server import MXServer
    
    # relative path will be from WebSphere/AppServer/profiles/<profile>
    logFile = File("logs/{}/SystemOut.log".format(MXServer.getMXServer().getProperty("mxe.name")))
    print "{}:\n\n".format(logFile.getAbsolutePath())
    
    reader = BufferedReader(FileReader(logFile.getAbsolutePath()))
    while reader.ready():
        print(reader.readLine())
    reader.close()​


    ------------------------------
    Blessings,
    Jason Uppenborn
    Sr. Technical Maximo Consultant
    Ontracks Consulting
    ------------------------------



  • 11.  RE: Reliability issues with escalations

    Posted Tue November 16, 2021 08:48 AM
    The problem I've had in the past with Escalations was before Automation Scripts were around. We wanted an Escalation to run just after a record change to catch a specific condition - e.g. a PR with a decimal quantity and a UOM of 'EA'. 

    Can you provide the SQL for Escalation and the trigger conditions?

    ------------------------------
    Jason Verly
    Reliability Engineering Manager
    Agropur US
    Le Sueur MN
    ------------------------------



  • 12.  RE: Reliability issues with escalations

    Posted Tue November 16, 2021 08:49 AM

    As Shannon mentioned, behind the scenes an escalation scheduled is managed entirely by a cron task so not running on schedule wouldn't be improved by using a cron task instead. When I use a cron task tends to be when I'm doing some bulk processing that can be done more efficiently in a single execution vs the record by record approach that escalations utilize. I use escalations typically when I need one time actions (such as a notification because a WO is due soon) because it has the built in tracking mechanism to avoid spamming users. 

    I'm assuming these are calling automation script actions because core actions (such as set value) tend to be extremely reliable. My guess is either these are taking longer than you expect (a cron task won't start until the previous execution completes) or getting an unhandled exception that prevents it from completing properly. Log access and database access are really needed to see if that's what's occurring. Querying for cron tasks that have a laststart after lastend (select * from taskscheduler where lastrun>lastend) will show you if Maximo thinks it's still running. Then reviewing the logs around that time will hopefully help show if an unhandled exception occurred. 

    If you wanted to, you could surface taskscheduler in the cron task application so you wouldn't necessarily need database access. But it's really hard without log access to triage. 



    ------------------------------
    Steven Shull
    ------------------------------