Cognos Analytics

 View Only
Expand all | Collapse all

APAR 42408 - schedules fail intermittently with XML errors

  • 1.  APAR 42408 - schedules fail intermittently with XML errors

    Posted Tue April 05, 2022 09:28 AM
    Hello,

    We reported this issue in 2021 where schedules fail intermittently with XML errors in 11.2.1. We were told it affects all versions apart from 11.1.5 and the APAR status is not fixed. Therefore there is no version of Cognos that is Windows 2019-compatible that does not have this issue.
    Can anyone advise whether it is fixed in 11.2 please?

    https://login.ibm.com/oidc/sps/auth?client_id=YWFlMTBjZDktMjEwYi00&Target=https%3A%2F%2Flogin.ibm.com%2Foidc%2Fendpoint%2Fdefault%2Fauthorize%3FqsId%3D1a4eeb31-ca08-4f54-8b20-3e120991a999%26client_id%3DYWFlMTBjZDktMjEwYi00http://www.ibm.com/support/pages/apar/PH42408?myns=apar&mynp=DOCTYPEstatus&mync=E&cm_sp=apar-_-DOCTYPEstatus-_-E

    Regards,
    Ceri Small

    ------------------------------
    Ceri Small
    ------------------------------

    #CognosAnalyticswithWatson


  • 2.  RE: APAR 42408 - schedules fail intermittently with XML errors

    IBM Champion
    Posted Tue April 05, 2022 11:33 AM
    Hi Ceri,

    There is no Closed date or Fix information in the following page:
    https://www.ibm.com/support/pages/apar/PH42408

    I think you need to contact IBM Support to get an official answer.

    Best regards,

    ------------------------------
    Patrick Neveu
    Positive Thinking Company
    ------------------------------



  • 3.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Tue April 05, 2022 12:59 PM
    Hi Patrick ,
    I do contact IBM Support regularly, but they refer me to the APAR status!!
    I just wondered if anyone had 'inside' info.!this is stopping all our customer upgrades now.

    regards,
    Ceri

    ------------------------------
    Ceri Small
    ------------------------------



  • 4.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Tue April 05, 2022 02:37 PM

    I have had a support ticket open with IBM since 10/2020 (yes, over a year and a half!). 

    We are stuck on 11.1.5 and have been unable to move to a newer version because not only do we get loads of those XML errors, but after about 12 hours our scheduled reports just stop running.  The only way to get the scheduler to "wake up" again is to log into the Cognos portal (or of course restart Cognos).  Unfortunately those schedules that were skipped are just "gone" - as if they were never scheduled in the first place.  The scheduler will only start running schedules that are next based on the "current" date/time.

    IBM has said we're the only company in the ENTIRE WORLD experiencing this issue (with the schedules stopping) and were unable to duplicate it in their own environment (though they only had access to Windows Server 2012) so a year ago we gave them direct access to one of our Dev servers.

    We first encountered this issue with 11.1.6 and it still existed in 11.2 (haven't tried 11.2.1 or 11.2.2)

    We're still waiting for a solution.

    Just FYI but we're running on Windows Server 2016 (64bit of course) with Windows Active Directory as our security.

    We're not sure if/when IBM is going to fix this and our IT Director is starting to give us pressure about resolving the Log4j issue.  We decided we're going to try the latest release of 11.1.7 (looks like IF008) and then using the API write a Java program that will regularly log into the portal.  We're hoping this will keep the schedules running.  But if we continue to see too many of these XML failures we might have to start looking at other solutions, which will really suck because we have years invested in cognos (started with Decisionstream 7!).

    BTW, it's interesting that this APAR is from Nov 2021 and it still might be ours, but since IBM has never been able to figure out the cause of the schedules stopping they've never created an APAR for it.



    ------------------------------
    Wayne Westlake
    ------------------------------



  • 5.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Wed April 06, 2022 04:26 AM
    Wayne - that's interesting. Your experience mirrors ours. We first saw this issue on a customer site with version 11.0.12 between Oct 2019 and Feb 2020, Finally IBM recognised the issue and recommended an upgrade to 11.1.5 in March 2020, which we did and the issue was solved at that customer site. Then we tried subsequently to implement later versions at other customers, looking for a Windows 2019-compatible version, and the issue has re-appeared! I asked this to be raised as an URGENT APAR in November 2021, and stated clearly that it is preventing all our customers upgrading to a Windows 2019 compatible version (bearing in mind that Windows 2019 is now 3 years old! 
    Some of our customers are far from happy!

    regards,
    Ceri

    ------------------------------
    Ceri Small
    ------------------------------



  • 6.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Wed April 06, 2022 08:54 AM
    Wayne,

    I just scrolled through the case updates and found the following from December 1st 2021 - I also saw the issue you describe where schedules stop if the service account user is not logged in.

    Regards,
    Ceri

    Dec 01, 2021, 02:27

    Thank you Andzrej - we will do this.

    I just wanted to add one further issue that has been happening since the upgrade that I logged last week - schedules only seem to be running when the service account is logged in to Cognos. I attach screenshots and logs for this. They have just started running again since I logged in this morning. Hopefully I won't have to open another case as it's part of the same problem? I believe this may be covered by the following 11.2.1 fix list item:

    PH35545 SCHEDULES STOP EXECUTING UNTIL USER LOGS IN TO COGNOS ANALYTICS UI

    Just one further thing - as per your recommendations last week I updated the query Java heap settings (screenshot attached). Can you check these and do you think it may be worth increasing further?

    Regards,

    ceri



    ------------------------------
    Ceri Small
    ------------------------------



  • 7.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Wed April 06, 2022 07:36 AM
    can you show a screen print of these xml errors?

    ------------------------------
    brenda grossnickle
    BI Programmer Analyst
    FIS
    ------------------------------



  • 8.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Wed April 06, 2022 07:46 AM








    ------------------------------
    Ceri Small
    ------------------------------



  • 9.  RE: APAR 42408 - schedules fail intermittently with XML errors

    IBM Champion
    Posted Wed April 06, 2022 08:25 AM
    Hi Ceri,

    I don't know if it is just my browser and I, but I can barely read the message on your screen shots!

    Best regards,

    ------------------------------
    Patrick Neveu
    Positive Thinking Company
    ------------------------------



  • 10.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Wed April 06, 2022 08:43 AM
    Hi Patrick,

    Apologies - I tried to upload the original Word document containing the screenshots that sent with the case to IBM support but the 'UPLOAD File' or 'Insert Image' buttons don't seem to work with any browser I use?
    Any suggestions?
    Incidentally, IBM have updated the case number (TS007613375) today:-

    "Dev is preparing solution, we have no ETA yet but should have soon.

    In the meantime, as 11.2.2 is available, I wanted to ask if you perhaps upgraded or intend to do so soon?"
    I have asked whether this APAR is resolved in 11.2.2, or will be in a future fix release.




    ------------------------------
    Ceri Small
    ------------------------------



  • 11.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Wed April 06, 2022 08:33 AM
    Edited by System Fri January 20, 2023 04:18 PM
    can you try posting the full screen shots again. i cannot read any of the messages. but please still try to keep the full screen shot, if you can. i use to do support and one of my frustrations was getting a screen shot cut out of an error message sent to me. there was no context. no idea what screen they were on, what else was on the screen, what they were trying to do, etc?

    but i realize that maybe you are posting from a word doc previous screen shots. then maybe if you could type in the error message in the reply. thanks for the help. 

    ------------------------------
    brenda grossnickle
    BI Programmer Analyst
    FIS
    ------------------------------
    ---------------
    #CognosAnalyticswithWatson


  • 12.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Wed April 06, 2022 08:47 AM


    ------------------------------
    Ceri Small
    ------------------------------



  • 13.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Wed April 06, 2022 08:48 AM


    ------------------------------
    Ceri Small
    ------------------------------



  • 14.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Wed April 06, 2022 08:48 AM


    ------------------------------
    Ceri Small
    ------------------------------



  • 15.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Wed April 06, 2022 08:49 AM


    ------------------------------
    Ceri Small
    ------------------------------



  • 16.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Wed April 06, 2022 09:05 AM
    so the errors seem to be the following 3? Just typing them out so that if anyone does a search they can find them.

    java.lang.NullPointerException

    CNC-ASV-0001 the following Agent Service general error occurred: java.lang.NullPointerException


    CNC-ASV-0001 the following Agent Service general error occurred: [java.lang.NullPointerException] null



    ------------------------------
    brenda grossnickle
    BI Programmer Analyst
    FIS
    ------------------------------



  • 17.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Wed April 06, 2022 09:11 AM
    Brenda - yes that's correct.
    There are many proposed technotes for resolving the issue, such as copying/saving the events etc. but none of the proposed solutions work. We ran through many of these with Tech Support who eventually agreed that it is a bug as the issue happens so randomly, but regularly - in a small environment with around 60 schedules running per day the issue happened on average 2-3 times a day, and often when only one schedule was running at a time.

    regards,
    Ceri.

    ------------------------------
    Ceri Small
    ------------------------------



  • 18.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Wed April 06, 2022 09:09 AM
    Edited by System Fri January 20, 2023 04:16 PM

    Here's a text list of all the various XML errors we get on anything past 11.1.5:


    ; nested exception is: org.xml.sax.SAXParseException: XML document structures must start and end within the same entity. org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.

    org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.

    CNC-MES-1500 The message could not be created. org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.

    CNC-ASV-0025 Agent Condition invalid: CNC-MON-0035 Invalid request.

    java.lang.IllegalArgumentException

    java.lang.NullPointerException


    Ceri, we have tried everything under the sun.  Java heap, eliminating all but 1 scheduled report being a simple Great Outdoors sample report, etc.  Like I said, been working with IBM support for a year and a half and still....

    Our paths are very similar indeed!  We went from 10.2.2 to 11.0.12 and a few months later discovered 11.0.12 had a bad bug that corrupted any Event that we edited or created.  Tried GA 11.0.13 but it had a bad bug where all schedules went out as the default user (the one you set up in configuration in the SMTP section) instead of the schedule owner.  The fix they implemented for that started this onslaught of XML errors and failures.  IBM eventually pushed us to 11.1.4 saying that many of the XML errors had been resolved in 11.1.2 and would most likely NOT be addressed in 11.0.13 (despite it being the long term release!).  By the time we were ready to go live 11.1.5 came out and we jumped on it.  Bad bug that SSO didn't work so they have to release a patch, so officially we're on 11.1.5 IF1004.

    As a last note, we have about 800 scheduled reports with about 500 that run per day (several are Events that check for a condition as often as every 15 minutes.


    Wayne



    ------------------------------
    Wayne Westlake
    ------------------------------



  • 19.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Wed April 06, 2022 09:56 AM
    Seems like you've been through the same pain as us Wayne!
    Many of our customers have scheduled events that run off triggers from their ERP systems, but also a mixture of these and time-based schedules that run every few minutes or so. The largest has probably 600 enabled schedules, and we can't afford a random 10% of those failing daily with the XML issue, so they are stuck on a 11.1.5 W2016 environment at the moment, even though they would like to upgrade to a Windows 2019 version. 
    Even then with larger environments , despite the obvious concern of more than one schedule running simultaneously, we proved pretty quickly in test environments that the issue regularly occurred when only one schedule was running.

    ------------------------------
    Ceri Small
    ------------------------------



  • 20.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Thu April 07, 2022 05:43 AM
    We are on 11.0.13, the latest fix for that. We started having schedules disappear for no reason. We get no errors, they just disappear. This has been happening for over 3 weeks. 

    Interesting that you all say this is happening at newer versions. We don't get errors. Schedules just disappear. We had > 300 schedules between enabled/disabled. We had about 100 removed as a prolific user will be leaving and we consolidated into one job, a high # of those schedules. ​

    ------------------------------
    Vic Nicholls
    ------------------------------



  • 21.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Thu April 07, 2022 12:45 PM
    Hey Vic,

    When you say disappear, you mean that the schedule gets removed from the report/job?

    So far I haven't see that with our stuff on 11.1.5 or the testing we've tried lately with 11.1.7.  Just that like I mentioned, after about 12 hours the scheduler just stops running anything - but after logging into the portal it automagically resumes but only those schedules that are actually due up next.  Everything that was supposed to run just never gets run but the schedules still exist and will run again at the appropriate time - if things haven't all stopped again.

    Naturally during the week on our production server it would be rare that 12 hours would pass without someone logging it, and early on it was suggested by IBM that we simply make sure someone logs in at least every 12 hrs on the weekends!  *facepalm*


    Wayne

    ------------------------------
    Wayne Westlake
    ------------------------------



  • 22.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Thu April 07, 2022 12:57 PM
    Disappear means that it might be scheduled to run in an upcoming job. When it is time for it to run, it never runs, it just disappears. There is no evidence it ran, no evidence it emailed any one saying it ran, no error messages, nothing. It is in the queue and then all trace of it is removed. 

    The schedule is still there, it shows in the upcoming jobs, but that's the end of it. 

    I am logged into the Cognos production instance as our admin user (and our personal admin accounts can do it also). 

    Voc

    ------------------------------
    Vic Nicholls
    ------------------------------



  • 23.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Thu April 07, 2022 01:55 PM
    Hey Vic,

    That is EXACTLY the behavior I encountered when first testing 11.1.6 and everything afterward - all the way up to 11.2.  Except, as I mentioned, after approximately 12 hours (sometimes after just a few hours, sometimes longer than 12 but never more than 20) ALL schedules stop running.  But yeah, no error, no trace, NOTHING in the auditing database, nothing in the logs, etc.  It's like the scheduler process just goes to sleep until you log into the portal and then boom, it wakes up and starts running things again.


    Wayne

    ------------------------------
    Wayne Westlake
    ------------------------------



  • 24.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Thu April 07, 2022 02:30 PM
    I wonder if this has to do with the Java fix that Cognos gave us. That's the only thing we've changed. 

    Well that makes no sense because I've been logged in and logging in our production instance, and it still doesn't work. 

    Vic

    ------------------------------
    Vic Nicholls
    ------------------------------



  • 25.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Fri April 08, 2022 06:06 AM
    We had the same issue too when upgrading to the IF8 patch that fixes the Log4j issue, it was working fine before that.

    ------------------------------
    Ash
    ------------------------------



  • 26.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Wed April 13, 2022 08:13 AM
    Hi All,

    Thanks to all who commented on this issue. I received notification on the support case today that a fix is available - see below.
    Once I have installed and tested I will report back.

    Regards,
    Ceri.

    From IBM support:
    Apr 13, 2022, 07:02

    Hi Ceri,

    I would like to inform you that development prepared Interifm Fix 11.2.1.0 IF1011 that addresses defect PH42408.

    Can you please provide webID for which the IF access should be granted?



    ------------------------------
    Ceri Small
    ------------------------------



  • 27.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Wed April 13, 2022 01:25 PM
    HI Ceri!

    Thanks for the update.  Yes, please let us know if that fix works.  It would be interesting if IBM can come out with a similar fix for 11.1.7 (and 11.0.13).  MAYBE it could be put in the 11.1.7 FixPack 5 due out "sometime" next month - though there might not be enough time to test.

    Regards,

    Wayne

    ------------------------------
    Wayne Westlake
    ------------------------------



  • 28.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Thu April 14, 2022 01:22 PM
    Just FYI but today we were notified by support that our 1.5 year old ticket has finally been resolved with a fix.  It sounds like this might be the same fix as Ceri's.  They said it's being released now in an 11.2.1 IF, OR, we can wait and it will be in both 11.1.7 FP5 and 11.2.3 - both I believe will be released "sometime" next month.

    We're trying to decide if we want to try the baby step of going to 11.1.7 (from 11.1.5) or try all the way to 11.2.3.  Hate to finally migrate to 11.1.7 and then get an EOL announcement, but from what I'm seeing with other threads is 11.2 is rather aggressive and each new feature release is breaking other things.  11.1.7 IS supposed to be the stable LTM but so was 11.0.13 and support pushed us off that to 11.1.5.


    Wayne

    ------------------------------
    Wayne Westlake
    ------------------------------



  • 29.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Tue June 28, 2022 04:45 AM
    Sorry for the delay in reporting back. It took a while for availability of our tech support function. We applied IF011 to 11.2.1 in our sandbox environment and, initially, set 2 schedules to run every 3 minutes. For one week - no issues. We then increased the frequency of the 2 schedules to run every minute. Within one day we saw the same xml error. We then reduced the frequency again to 3 minutes and have seen no issue for the last week. So the issue still exists and definitely appears to be related to system usage constraints.
    I have reported back to IBM.
    The fix was also available in 11.1.7.

    regards,
    Ceri

    ------------------------------
    Ceri Small
    ------------------------------



  • 30.  RE: APAR 42408 - schedules fail intermittently with XML errors

    Posted Tue June 28, 2022 12:59 PM

    Hey Ceri!

    Thanks for replying to this thread, I'd completely forgotten about it!

    So we've been testing 11.1.7 FixPack 5 for the last 3 weeks.  It contains the "fix" for our year and a half old ticket where for us, all schedules would stop running after about 12 to 24 hrs if no one logged into the portal.

    So, that bug is "fixed" per se.  They never figured out the problem and claim we're the only Cognos user in the world experiencing it.  So what they implemented is watchdog code that catches the error and then resubmits the Event, Job or Report that was failing (and we don't see the error in the Past Activities).  We notice a few times a day things that run 3 minutes later than scheduled. Lazy and far from ideal, but if that's the best they can do...shrug.

    HOWEVER, now we have 6 new bugs that randomly cause events, jobs & reports to fail including the errors Brenda reported above:

    java.lang.NullPointerException
    QE-DEF-0285 The logon failed.
    CNC-SDS-0403 Unable to add history details in service [monitorService]
    CNC-ASV-0025 Agent Condition invalid: CNC-MON-0035 Invalid request.
    "Unavailable" in a job step that is a legitimate job/report (one was every single step in the job)
    Zombie processes that persist in the Current Activities (usually say Waiting) even after a reboot until we run the NC_DROP.sql on the Notifications database

    So since IBM now will ONLY address one issue/error at a time I have 6 new tickets to create!

    I picked the one happening the most frequently, java.lang.NullPointerException, and opened a ticket on it yesterday.  I will report back on this thread how things go.  They asked for more info yesterday - screenshots of View run history details, etc.

    My boss and I are wondering if there are many actual Cognos customers left.  How can these kinds of errors persist for 2 years (most started with 11.1.6) and STILL be unresolved?

    Regards,

    Wayne



    ------------------------------
    Wayne Westlake
    ------------------------------