MQ

 View Only

MQ Operator: Error after upgrading the OPenshift CLuster

  • 1.  MQ Operator: Error after upgrading the OPenshift CLuster

    Posted 7 hours ago

    Hi, 

    I have 2 QMs with Native HA deployed in our test environment. I execute today a minor upgrade on Openshift, but one of the QMs didnt start.

    025-02-26T11:19:13.460Z Using queue manager name: TEST_MQ
    2025-02-26T11:19:13.460Z CPU architecture: amd64
    2025-02-26T11:19:13.460Z Linux kernel version: 5.14.0-427.50.1.el9_4.x86_64
    2025-02-26T11:19:13.461Z Base image: Red Hat Enterprise Linux 9.5 (Plow)
    2025-02-26T11:19:13.461Z Running as user ID 1000740000 with primary group 0, and supplementary groups 0,1000740000
    2025-02-26T11:19:13.461Z Capabilities: none
    2025-02-26T11:19:13.461Z seccomp enforcing mode: filtering
    2025-02-26T11:19:13.461Z Process security attributes: system_u:system_r:container_t:s0:c19,c27�
    2025-02-26T11:19:13.462Z Detected 'ext4' volume mounted to /mnt/mqm
    2025-02-26T11:19:13.462Z Detected 'ext4' volume mounted to /mnt/mqm-data
    2025-02-26T11:19:13.462Z Detected 'ext4' volume mounted to /mnt/mqm-log
    2025-02-26T11:19:13.491Z Error creating directory structure: the 'crtmqdir' command returned with code: 20. Reason: The filesystem object
    '/mnt/mqm/data/web/installations/Installation1/servers/mqweb/mqwebuser.xml' is
    a symbolic link.
    AMQ6245E: Error executing system call 'open' on file
    '/mnt/mqm-data/qmgrs/TEST_MQ/qm.ini' error '0'.
    AMQ6245E: Error executing system call 'mkdir' on file
    '/mnt/mqm-data/qmgrs/TEST_MQ/autocfg' error '2'.
    AMQ6245E: Error executing system call 'mkdir' on file
    '/mnt/mqm-data/qmgrs/TEST_MQ/ssl' error '2'.
    AMQ6245E: Error executing system call 'mkdir' on file
    '/mnt/mqm-data/qmgrs/TEST_MQ/plugcomp' error '2'.
    
    2025-02-26T11:19:13.492Z /opt/mqm/bin/crtmqdir: exit status 20

    As consecuence, i have 2 pods failing with this error and i can not start the QueueManager (im lucky this is just happening in test environment)

    I guess it is a problem with the volumes as the QM is trying to restart again from the configuration. I saw this issue before when i was removing a QMs but not the volumes.

    here my qm definition (i remove some fields for simplicity)

    spec:
      web:
        console:
          authentication:
            provider: manual
          authorization:
            provider: manual
        enabled: true
        manualConfig:
          configMap:
            name: mq-web-config
      version: 9.4.1.1-r1
      template:
        pod:
          containers:
            - env:
                - name: MQ_ENABLE_EMBEDDED_WEB_SERVER
                  value: 'true'
              name: qmgr
              resources: {}
        route:
          enabled: true
        name: TEST_MQ
        mqsc:
          - configMap:
              items:
                - 91-startup.mqsc
              name: test-mq-mqsc-startup
          - secret:
              items:
                - 92-ldapauth.mqsc
              name: test-mq-mqsc-ldapauth
        logFormat: Basic
        availability:
          type: NativeHA
          updateStrategy: RollingUpdate
        storage:
          defaultClass: thin-csi
          persistedData:
            enabled: true
            size: 2Gi
            type: persistent-claim
          queueManager:
            class: thin-csi
            size: 20Gi
            type: persistent-claim
          recoveryLogs:
            enabled: true
            size: 2Gi
            type: persistent-claim


    ------------------------------
    Andres Colodrero
    ------------------------------