AIX

 View Only
Expand all | Collapse all

aix 7.2 strange behaviour

  • 1.  aix 7.2 strange behaviour

    Posted Wed May 18, 2022 08:36 AM
    Hi all,

    Since upgrading a server from 7200-03-03-1914 to 7200-03-04-1938, I have noticed strange things occurring:
    every day, not always the same time of the day
    - the filesystems (at least /var) are remount to read only mode
    - the system clock goes back three hours!
    - most os commands failed with permission denied (even as root)

    then after few minutes, everything goes back to normal.
    Nothing shows on: errlog, conslog, syslog (all facilities) etc.

    I locked users out, stopped all users jobs and cron jobs but it didn't help.
    I suspect the upgrade but downgrading is to expensive to do base on a hunch.

    Any advice?
    Jacob


    ------------------------------
    Yaacov Fried
    ------------------------------


  • 2.  RE: aix 7.2 strange behaviour

    Posted Wed May 18, 2022 08:47 AM
    Jacob,

    Try leaving a console window open and watch for messages.

    Are you booting from SAN? Remounting /var and OS commands failing are
    often a sign rootvg is not available.

    Do you set the clock forward, or does it fix itself?

    Thanks.

    On Wed, May 18, 2022 at 12:35:44PM +0000, Yaacov Fried via IBM Community wrote:
    > Hi all,
    >
    > Since upgrading a server from 7200-03-03-1914 to 7200-03-04-1938, I have noticed strange things occurring:
    > every day, not always the same time of the day
    > - the filesystems (at least /var) are remount to read only mode
    > - the system clock goes back three hours!
    > - most os commands failed with permission denied (even as root)
    >
    > then after few minutes, everything goes back to normal.
    > Nothing shows on: errlog, conslog, syslog (all facilities) etc.
    >
    > I locked users out, stopped all users jobs and cron jobs but it didn't help.
    > I suspect the upgrade but downgrading is to expensive to do base on a hunch.
    >
    > Any advice?
    > Jacob
    >
    > ------------------------------
    > Yaacov Fried
    > ------------------------------
    >
    >
    > Reply to Sender : https://community.ibm.com/community/user/eGroups/PostReply?GroupId=6049&MID=256257&SenderKey=95a8b517-d5d3-4b7c-89e9-f4a0984012e6
    >
    > Reply to Discussion : https://community.ibm.com/community/user/eGroups/PostReply?GroupId=6049&MID=256257
    >
    >
    >
    > You are subscribed to "AIX" as Russell.Adams@AdamsSystems.nl. To change your subscriptions, go to http://community.ibm.com/community/user/preferences?section=Subscriptions. To unsubscribe from this community discussion, go to http://community.ibm.com/HigherLogic/eGroups/Unsubscribe.aspx?UserKey=c23dfccc-9910-40ae-beeb-fdcbced5bf1f&sKey=KeyRemoved&GroupKey=7b554d78-d4dc-417a-b4dc-017e309e5c91.


    ------------------------------------------------------------------
    Russell Adams Russell.Adams@AdamsSystems.nl
    Principal Consultant Adams Systems Consultancy
    https://adamssystems.nl/




  • 3.  RE: aix 7.2 strange behaviour

    Posted Wed May 18, 2022 09:02 AM
    Hi,

    Thanks for replying.
    - Console window is open but no messages (not also in conslog or any other system log).
    - rootvg is a local disk (no SAN at all)
    - The clock sets back to the correct time by itself after short period (minutes)

    Jacob

    ------------------------------
    Yaacov Fried
    ------------------------------



  • 4.  RE: aix 7.2 strange behaviour

    Posted Wed May 18, 2022 11:00 AM
    Jacob,

    Are you recording performance information (ie: nmon)? Can you look at
    the history and see if there's anything unusual?

    Thanks.

    On Wed, May 18, 2022 at 01:01:45PM +0000, Yaacov Fried via IBM Community wrote:
    > Hi,
    >
    > Thanks for replying.
    > - Console window is open but no messages (not also in conslog or any other system log).
    > - rootvg is a local disk (no SAN at all)
    > - The clock sets back to the correct time by itself after short period (minutes)
    >
    > Jacob
    >
    > ------------------------------
    > Yaacov Fried
    > ------------------------------
    > -------------------------------------------
    > Original Message:
    > Sent: Wed May 18, 2022 08:46 AM
    > From: Russell Adams
    > Subject: aix 7.2 strange behaviour
    >
    > Jacob,
    >
    > Try leaving a console window open and watch for messages.
    >
    > Are you booting from SAN? Remounting /var and OS commands failing are
    > often a sign rootvg is not available.
    >
    > Do you set the clock forward, or does it fix itself?
    >
    > Thanks.
    >
    > On Wed, May 18, 2022 at 12:35:44PM +0000, Yaacov Fried via IBM Community wrote:
    > > Hi all,
    > >
    > > Since upgrading a server from 7200-03-03-1914 to 7200-03-04-1938, I have noticed strange things occurring:
    > > every day, not always the same time of the day
    > > - the filesystems (at least /var) are remount to read only mode
    > > - the system clock goes back three hours!
    > > - most os commands failed with permission denied (even as root)
    > >
    > > then after few minutes, everything goes back to normal.
    > > Nothing shows on: errlog, conslog, syslog (all facilities) etc.
    > >
    > > I locked users out, stopped all users jobs and cron jobs but it didn't help.
    > > I suspect the upgrade but downgrading is to expensive to do base on a hunch.
    > >
    > > Any advice?
    > > Jacob
    > >
    > > ------------------------------
    > > Yaacov Fried
    > > ------------------------------
    > >
    > >
    > > Reply to Sender : https://community.ibm.com/community/user/eGroups/PostReply?GroupId=6049&MID=256257&SenderKey=95a8b517-d5d3-4b7c-89e9-f4a0984012e6 <https: community.ibm.com/community/user/egroups/postreply?groupid="6049&MID=256257&SenderKey=95a8b517-d5d3-4b7c-89e9-f4a0984012e6">
    > >
    > > Reply to Discussion : https://community.ibm.com/community/user/eGroups/PostReply?GroupId=6049&MID=256257 <https: community.ibm.com/community/user/egroups/postreply?groupid="6049&MID=256257">
    > >
    > >
    > >
    > > You are subscribed to "AIX" as Russell.Adams@AdamsSystems.nl <russell.adams@adamssystems.nl>. To change your subscriptions, go to http://community.ibm.com/community/user/preferences?section=Subscriptions. <http: community.ibm.com/community/user/preferences?section="Subscriptions."> To unsubscribe from this community discussion, go to http://community.ibm.com/HigherLogic/eGroups/Unsubscribe.aspx?UserKey=c23dfccc-9910-40ae-beeb-fdcbced5bf1f&sKey=KeyRemoved&GroupKey=7b554d78-d4dc-417a-b4dc-017e309e5c91. <http: community.ibm.com/higherlogic/egroups/unsubscribe.aspx?userkey="c23dfccc-9910-40ae-beeb-fdcbced5bf1f&sKey=KeyRemoved&GroupKey=7b554d78-d4dc-417a-b4dc-017e309e5c91.">
    >
    >
    > ------------------------------------------------------------------
    > Russell Adams Russell.Adams@AdamsSystems.nl <russell.adams@adamssystems.nl>
    > Principal Consultant Adams Systems Consultancy
    > https://adamssystems.nl/ <https: adamssystems.nl/="">
    >
    >
    > Original Message:
    > Sent: 5/18/2022 3:49:00 AM
    > From: Yaacov Fried
    > Subject: aix 7.2 strange behaviour
    >
    > Hi all,
    >
    > Since upgrading a server from 7200-03-03-1914 to 7200-03-04-1938, I have noticed strange things occurring:
    > every day, not always the same time of the day
    > - the filesystems (at least /var) are remount to read only mode
    > - the system clock goes back three hours!
    > - most os commands failed with permission denied (even as root)
    >
    > then after few minutes, everything goes back to normal.
    > Nothing shows on: errlog, conslog, syslog (all facilities) etc.
    >
    > I locked users out, stopped all users jobs and cron jobs but it didn't help.
    > I suspect the upgrade but downgrading is to expensive to do base on a hunch.
    >
    > Any advice?
    > Jacob
    >
    > ------------------------------
    > Yaacov Fried
    > ------------------------------
    >
    >
    > Reply to Sender : https://community.ibm.com/community/user/eGroups/PostReply?GroupId=6049&MID=256278&SenderKey=95a8b517-d5d3-4b7c-89e9-f4a0984012e6
    >
    > Reply to Discussion : https://community.ibm.com/community/user/eGroups/PostReply?GroupId=6049&MID=256278
    >
    >
    >
    > You are subscribed to "AIX" as Russell.Adams@AdamsSystems.nl. To change your subscriptions, go to http://community.ibm.com/community/user/preferences?section=Subscriptions. To unsubscribe from this community discussion, go to http://community.ibm.com/HigherLogic/eGroups/Unsubscribe.aspx?UserKey=c23dfccc-9910-40ae-beeb-fdcbced5bf1f&sKey=KeyRemoved&GroupKey=7b554d78-d4dc-417a-b4dc-017e309e5c91.


    ------------------------------------------------------------------
    Russell Adams Russell.Adams@AdamsSystems.nl
    Principal Consultant Adams Systems Consultancy
    https://adamssystems.nl/




  • 5.  RE: aix 7.2 strange behaviour

    Posted Thu May 19, 2022 10:11 AM
    Hi,
    I have recordings of NMON, I couldn't find any abnormalities there.
    I checked especially the time frame of the incident, but could not see anything unusual.

    ------------------------------
    Yaacov Fried
    ------------------------------



  • 6.  RE: aix 7.2 strange behaviour

    IBM Champion
    Posted Thu May 19, 2022 05:12 AM

    The lack of messages in the errorlog can be caused by /var going read-only.  Is it the only filesystem that goes read-only ?

    If it's not there already, temporarily add to syslog.conf:

    *.debug /dev/console

    (Note that this may be excessively noisy)

    As an alternative, create a 1GB ramdisk, format it as JFS2, mount it in /ram, touch /ram/log, add to syslog.conf:

    *.debug /ram/log rotate 1m files 512

    That way, even if /var goes read-only, there's a way to keep writing the logs.



    ------------------------------
    José Pina Coelho
    IT Specialist at Kyndryl
    ------------------------------



  • 7.  RE: aix 7.2 strange behaviour

    Posted Thu May 19, 2022 05:42 AM
    Another idea is to forward errpt to syslog, and then log to a remote
    syslog server in your syslog.conf (@remote).

    You have to add the second stanza from here to the ODM which forwards errpt to
    syslog by piping to /usr/bin/logger:

    https://adamssystems.nl/posts/simple-error-reporting/

    Then you will get both syslogs and errpt at the remote syslog server.

    Separately, I'd focus on how /var is going read only. Can you fsck it
    online and see if it throws an error? Linux filesystems often have a
    remount read only on error, but I don't recall that on AIX.

    On Thu, May 19, 2022 at 09:11:37AM +0000, Jos? Pina Coelho via IBM Community wrote:
    > The lack of messages in the errorlog can be caused by /var going read-only. Is it the only filesystem that goes read-only ?
    >
    >
    >
    >
    >
    > If it's not there already, temporarily add to syslog.conf:
    >
    >
    > *.debug /dev/console
    >
    >
    > (Note that this may be excessively noisy)
    >
    >
    >
    >
    >
    > As an alternative, create a 1GB ramdisk, format it as JFS2, mount it in /ram, touch /ram/log, add to syslog.conf:
    >
    >
    > *.debug /ram/log rotate 1m files 512
    >
    >
    >
    >
    >
    > That way, even if /var goes read-only, there's a way to keep writing the logs.
    >
    >
    >
    >
    > ------------------------------
    > Jos? Pina Coelho
    > IT Specialist at Kyndryl
    > ------------------------------
    > -------------------------------------------
    > Original Message:
    > Sent: Wed May 18, 2022 09:01 AM
    > From: Yaacov Fried
    > Subject: aix 7.2 strange behaviour
    >
    > Hi,
    >
    > Thanks for replying.
    > - Console window is open but no messages (not also in conslog or any other system log).
    > - rootvg is a local disk (no SAN at all)
    > - The clock sets back to the correct time by itself after short period (minutes)
    >
    > Jacob
    >
    > ------------------------------
    > Yaacov Fried
    > ------------------------------
    >
    > Original Message:
    > Sent: Wed May 18, 2022 08:46 AM
    > From: Russell Adams
    > Subject: aix 7.2 strange behaviour
    >
    > Jacob,
    >
    > Try leaving a console window open and watch for messages.
    >
    > Are you booting from SAN? Remounting /var and OS commands failing are
    > often a sign rootvg is not available.
    >
    > Do you set the clock forward, or does it fix itself?
    >
    > Thanks.
    >
    > On Wed, May 18, 2022 at 12:35:44PM +0000, Yaacov Fried via IBM Community wrote:
    > > Hi all,
    > >
    > > Since upgrading a server from 7200-03-03-1914 to 7200-03-04-1938, I have noticed strange things occurring:
    > > every day, not always the same time of the day
    > > - the filesystems (at least /var) are remount to read only mode
    > > - the system clock goes back three hours!
    > > - most os commands failed with permission denied (even as root)
    > >
    > > then after few minutes, everything goes back to normal.
    > > Nothing shows on: errlog, conslog, syslog (all facilities) etc.
    > >
    > > I locked users out, stopped all users jobs and cron jobs but it didn't help.
    > > I suspect the upgrade but downgrading is to expensive to do base on a hunch.
    > >
    > > Any advice?
    > > Jacob
    > >
    > > ------------------------------
    > > Yaacov Fried
    > > ------------------------------
    > >
    > >
    > > Reply to Sender : https://community.ibm.com/community/user/eGroups/PostReply?GroupId=6049&MID=256257&SenderKey=95a8b517-d5d3-4b7c-89e9-f4a0984012e6 <https: community.ibm.com/community/user/egroups/postreply?groupid="6049&MID=256257&SenderKey=95a8b517-d5d3-4b7c-89e9-f4a0984012e6">
    > >
    > > Reply to Discussion : https://community.ibm.com/community/user/eGroups/PostReply?GroupId=6049&MID=256257 <https: community.ibm.com/community/user/egroups/postreply?groupid="6049&MID=256257">
    > >
    > >
    > >
    > > You are subscribed to "AIX" as Russell.Adams@AdamsSystems.nl <russell.adams@adamssystems.nl>. To change your subscriptions, go to http://community.ibm.com/community/user/preferences?section=Subscriptions. <http: community.ibm.com/community/user/preferences?section="Subscriptions."> To unsubscribe from this community discussion, go to http://community.ibm.com/HigherLogic/eGroups/Unsubscribe.aspx?UserKey=c23dfccc-9910-40ae-beeb-fdcbced5bf1f&sKey=KeyRemoved&GroupKey=7b554d78-d4dc-417a-b4dc-017e309e5c91. <http: community.ibm.com/higherlogic/egroups/unsubscribe.aspx?userkey="c23dfccc-9910-40ae-beeb-fdcbced5bf1f&sKey=KeyRemoved&GroupKey=7b554d78-d4dc-417a-b4dc-017e309e5c91.">
    >
    >
    > ------------------------------------------------------------------
    > Russell Adams Russell.Adams@AdamsSystems.nl <russell.adams@adamssystems.nl>
    > Principal Consultant Adams Systems Consultancy
    > https://adamssystems.nl/ <https: adamssystems.nl/="">
    >
    >
    > Original Message:
    > Sent: 5/18/2022 3:49:00 AM
    > From: Yaacov Fried
    > Subject: aix 7.2 strange behaviour
    >
    > Hi all,
    >
    > Since upgrading a server from 7200-03-03-1914 to 7200-03-04-1938, I have noticed strange things occurring:
    > every day, not always the same time of the day
    > - the filesystems (at least /var) are remount to read only mode
    > - the system clock goes back three hours!
    > - most os commands failed with permission denied (even as root)
    >
    > then after few minutes, everything goes back to normal.
    > Nothing shows on: errlog, conslog, syslog (all facilities) etc.
    >
    > I locked users out, stopped all users jobs and cron jobs but it didn't help.
    > I suspect the upgrade but downgrading is to expensive to do base on a hunch.
    >
    > Any advice?
    > Jacob
    >
    > ------------------------------
    > Yaacov Fried
    > ------------------------------
    >
    >
    >
    > Reply to Sender : https://community.ibm.com/community/user/eGroups/PostReply?GroupId=6049&MID=256394&SenderKey=61c6a8df-6e3d-4b5e-86a5-7ba96a6879a8
    >
    > Reply to Discussion : https://community.ibm.com/community/user/eGroups/PostReply?GroupId=6049&MID=256394
    >
    >
    >
    > You are subscribed to "AIX" as Russell.Adams@AdamsSystems.nl. To change your subscriptions, go to http://community.ibm.com/community/user/preferences?section=Subscriptions. To unsubscribe from this community discussion, go to http://community.ibm.com/HigherLogic/eGroups/Unsubscribe.aspx?UserKey=c23dfccc-9910-40ae-beeb-fdcbced5bf1f&sKey=KeyRemoved&GroupKey=7b554d78-d4dc-417a-b4dc-017e309e5c91.


    ------------------------------------------------------------------
    Russell Adams Russell.Adams@AdamsSystems.nl
    Principal Consultant Adams Systems Consultancy
    https://adamssystems.nl/




  • 8.  RE: aix 7.2 strange behaviour

    Posted Thu May 19, 2022 10:15 AM
    I am not sure that only var goes read only.
    I'll try your suggestion about syslog
    Tnx

    ------------------------------
    Yaacov Fried
    ------------------------------



  • 9.  RE: aix 7.2 strange behaviour

    Posted Thu May 19, 2022 09:54 AM

    Hello Yaacov,

     

    Please check if the error demon is running:

     

        ps -ef | grep errd

     

    If you do not see it running, then you need to restart it:

     

        nohup /usr/lib/errdemon &

     

    Then check your error log.

     

    If this does not show you what happened, I strongly urge you to open a support ticket. Then we can better determine what is happening.

     

    File systems normally go into read only mode when we lose access to the disks. And that can also cause the errdemon to terminate. The errors are stored in persistent memory and will be processed into the error log by restarting the errdemon.

     

    Sent from Mail for Windows

     






  • 10.  RE: aix 7.2 strange behaviour

    Posted Thu May 19, 2022 10:38 AM
    Hi,

    error daemon is running (other issues are logged ok)
    I'll try a suggested by José  to run fsck  (although, I can't understand the clock's set back) , if it won't solve the problem. I will open a ticket.

    Thanks.


    ------------------------------
    Yaacov Fried
    ------------------------------



  • 11.  RE: aix 7.2 strange behaviour

    IBM Champion
    Posted Thu May 19, 2022 05:05 AM

    It's sure is a strange behavior...

    If it's not done yet, configure ntp.  Also check root's crontab to make sure you're not running ntpdate.
    (That back&forth sounds like cron+ntpdate+round-robin to TWO ntp servers, one of which is off by three hours)

    As to the read-only filesystem, if it was a filesystem integrity issue I'd expect a panic+dump, but try to follow the procedure for corrupted rootvg filesystems/logvolume (boot from NIM/DVD, varyon the rootvg, fsck all filesystems, reformat the log volume).  <=== mksysb first.

    Also, even with internal disks, I've seen non-logging problems (non frequent), try to set the rootvg to critical, that way the machine will reboot if the rootvg disk goes offline.



    ------------------------------
    José Pina Coelho
    IT Specialist at Kyndryl
    ------------------------------



  • 12.  RE: aix 7.2 strange behaviour

    Posted Thu May 19, 2022 10:30 AM
    NTP is configured, and not ntpdate in cron (actually crond is inoperative to eliminate the possibility that a cron job  is behind all this)
    Maybe you are right, and there is a corruption in rootvg, I will reboot to maintenance and run fsck.

    ------------------------------
    Yaacov Fried
    ------------------------------