AIOps

Expand all | Collapse all

Watson AIOps 2.0 - NOI healthcron pod CreateContainerError

  • 1.  Watson AIOps 2.0 - NOI healthcron pod CreateContainerError

    Posted 18 days ago
    I have Watson AIOps 2.0 Event Manager installed on OpenShift 4.7, and the healthcron pod cannot be created. 'oc get pods' shows it in a state of CreateContainerError. And running 'oc describe pod <pod_name>' shows this error:

    Events:
      Type Reason  Age                From Message
      ---- ------  ----               ---- -------
      Warning  Failed  64m (x533 over 3h6m)   kubelet  (combined from similar events): Error: container create failed: time="2021-04-02T17:06:12Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/netcool\") set in config.json failed: permission denied"

    Any recommended fix?


    ------------------------------
    Frank Tate
    Gulfsoft Consulting
    ------------------------------


  • 2.  RE: Watson AIOps 2.0 - NOI healthcron pod CreateContainerError

    Posted 15 days ago

    Hi Frank

    I can't say for certain, but what you are seeing looks strikingly similar to  https://bugzilla.redhat.com/show_bug.cgi?id=1934177    which reports to be a regression in OCP.

    I know for sure EventManager works fine on 4.4 , 4.5 and we have a recent install on 4.6.17_1533 - do you have the option of trying one of those versions and seeing if the issues you reported are no longer seen ?



    ------------------------------
    john postoyko
    IBM
    London
    ------------------------------



  • 3.  RE: Watson AIOps 2.0 - NOI healthcron pod CreateContainerError

    Posted 15 days ago
    Thanks for that information, John. I can try one of those versions now that I know one works. I actually tried OpenShift 4.6.23 (latest 4.6), and it was too unstable to even install Event Manager. So I'll destroy this cluster and create a new one at 4.6.17_1533.

    Frank

    ------------------------------
    Frank Tate
    Gulfsoft Consulting
    ------------------------------



  • 4.  RE: Watson AIOps 2.0 - NOI healthcron pod CreateContainerError

    Posted 15 days ago
    Alright. I installed OpenShift 4.6.17 (I don't know where you would find or see the "_1533" part of the version number - I downloaded the only 4.6.17 available). And now topology search works. Great.

    Unfortunately, a couple of other pods have issues:

    topology-status is in CrashLoopBackoff

    and 

    topology-nasm-net-disco-collector is in CreateContainerConfigError


    ------------------------------
    Frank Tate
    Gulfsoft Consulting
    ------------------------------



  • 5.  RE: Watson AIOps 2.0 - NOI healthcron pod CreateContainerError

    Posted 15 days ago
    Oh, and the healthcron pod is in the same state as the original problem, with the same error message about permissions on /home/netcool.

    ------------------------------
    Frank Tate
    Gulfsoft Consulting
    ------------------------------



  • 6.  RE: Watson AIOps 2.0 - NOI healthcron pod CreateContainerError

    Posted 15 days ago

    This is where I see the _1533 part



    ------------------------------
    john postoyko
    IBM
    London
    ------------------------------



  • 7.  RE: Watson AIOps 2.0 - NOI healthcron pod CreateContainerError

    Posted 15 days ago
    I'm running OpenShift on-prem in vSphere, so my console is different than yours, and doesn't include that information. All I see anywhere is 4.6.17.

    ------------------------------
    Frank Tate
    Gulfsoft Consulting
    ------------------------------



  • 8.  RE: Watson AIOps 2.0 - NOI healthcron pod CreateContainerError

    Posted 15 days ago

    and I know it doesn't help directly but my healthcron seems to be running every hour



    ------------------------------
    john postoyko
    IBM
    London
    ------------------------------



  • 9.  RE: Watson AIOps 2.0 - NOI healthcron pod CreateContainerError

    Posted 15 days ago


    topology-status is in CrashLoopBackoff

    Is there any additional information visible in the Events or Log Tab while it is in this state ?



    ------------------------------
    john postoyko
    IBM
    London
    ------------------------------



  • 10.  RE: Watson AIOps 2.0 - NOI healthcron pod CreateContainerError

    Posted 15 days ago
    These are the only messages in Events (and no errors are see in the Logs tab):



    ------------------------------
    Frank Tate
    Gulfsoft Consulting
    ------------------------------



  • 11.  RE: Watson AIOps 2.0 - NOI healthcron pod CreateContainerError

    Posted 15 days ago

    topology-nasm-net-disco-collector is in CreateContainerConfigError

    I had something similar a few weeks back - I had set true for disco but forgot to assign a storage class

    I dont know if you are planning to using that feature -  if not set it to false in the yaml file



    ------------------------------
    john postoyko
    IBM
    London
    ------------------------------



  • 12.  RE: Watson AIOps 2.0 - NOI healthcron pod CreateContainerError

    Posted 15 days ago
    That makes sense. I did not assign a storage class.

    ------------------------------
    Frank Tate
    Gulfsoft Consulting
    ------------------------------



  • 13.  RE: Watson AIOps 2.0 - NOI healthcron pod CreateContainerError

    Posted 14 days ago
    Just checking back with you to see where you at with this.

    ------------------------------
    john postoyko
    IBM
    London
    ------------------------------



  • 14.  RE: Watson AIOps 2.0 - NOI healthcron pod CreateContainerError

    Posted 13 days ago
    OpenShift 4.6.17 seems to work the best out of the versions I've tried. Thanks for the help, John.

    ------------------------------
    Frank Tate
    Gulfsoft Consulting
    ------------------------------