AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.

 View Only
  • 1.  Strange Alias related behaviour

    Posted Tue October 20, 2009 10:23 AM

    Originally posted by: Porktree


    I've got an Oracle database running on a p5 with 64gb of ram and 32 cpu's. I'm running 5.3. Over the last couple of days, it appears as though the Oracle DB hangs, it takes minutes to login, su'ing to users other than root takes 1-2 minutes if it completes. If I shut down the database and reboot the server, everything is fine for the next 18 or so hours, and then it happens again. Looking at resourses, I'm not paging, I've got about 10-20% idle, and none of the disks or dac's are saturated. ie, no issue is indicated.

    This morning, after I shut down the database, I tried to restart it without bouncing the server, and while it was starting I tried to 'tail -f $ALERT'. This command hung. I echo'd $ALERT and it immediately gave the path to the alert log, 'tail -f /u01/logs/alert.log' worked fine, so I tried the command again using the environment alias, and again it hung. What I notice then is the reason it's tough to su - to another user is because the .profile sources a lot of alias/environmental variables, and is hanging trying to access these.

    All resources are free, 99% idle cpu's - no disk access, memory is not paging. But, any (except echo) that references an $VAR of any kind hangs.

    I can't find anything about this, or what it might be or how to fix it. At this point I'm taking the system down tonight to rebuild the kernel and apply a couple of lpar's that I had planned on doing next weekend.

    Anyone have any ideas? Thanks.


  • 2.  Re: Strange Alias related behaviour

    Posted Tue October 20, 2009 11:16 AM

    Originally posted by: alethad


    Do you have anything in your errpt? And none of your filesystems are full? You know like /tmp?

    I've also had something similar happen to me with Oracle before along with a few of my OS commands. Not sure if this is your case but mine was having very similar issues due to everything seemed to take forever to execute. It turned out to be a resolving issue with DNS. In my case the DNS server blew up and networking didn't tell me right away. So my resolv.conf was wasn't correct. Gotta love those Bluecrap servers.

    Just 2 cents.
    Good luck.


  • 3.  Re: Strange Alias related behaviour

    Posted Tue October 20, 2009 11:32 AM

    Originally posted by: Porktree


    That's the frustrating part, there's nothing in the errpt, all the file systems have space, I can't see any issue. I'd thought it was an Oracle thing, until all the issues that happen after the db has been shutdown. Also, mount's all seem fine, the network is all good, it's something to do with aix or the korn shell and the way it manages $VAR's. When I watched the system go down this morning, I started with response times from the command line at 10 seconds at 4am, and decided to reboot when response dropped to 60 seconds at 6am. It was a steady decline from 10 to 60.


  • 4.  Re: Strange Alias related behaviour

    Posted Tue October 20, 2009 12:04 PM

    Originally posted by: alethad


    Which version of AIX5.3 and Oracle are you running?


  • 5.  Re: Strange Alias related behaviour

    Posted Tue October 20, 2009 01:24 PM

    Originally posted by: Porktree


    I'm on 5300-10-01-0921, or 5.3 TL 10, and Oracle 10.2.0.5.


  • 6.  Re: Strange Alias related behaviour

    Posted Tue October 20, 2009 02:20 PM

    Originally posted by: alethad


    I don't know if I wouldn't just call it in to support. You're on the latest OS and I don't see any new fixes.
    You don't have anything missing or broken right? Just throwing it out there. No known Oracle issues on this version OS that I know of either.

    It does look like a leak is sprung. :) It's hard to say with no evidence like good 'ol error messages or logs to go by. I gave you my best shot in the dark.
    I know how you feel though. Sorry about that.
    Maybe someone else can give you some better info.
    Good luck.


  • 7.  Re: Strange Alias related behaviour

    Posted Tue October 20, 2009 03:00 PM

    Originally posted by: Porktree


    Thanks for trying, this is a real puzzler. If only I could get an error message or a log file or something that would indicate where to start unraveling the thread.. I'm crossing my fingers on rebuilding tonight.


  • 8.  Re: Strange Alias related behaviour

    Posted Wed October 21, 2009 03:18 AM

    Originally posted by: Montecarlo


    > What I notice then is the reason it's tough to su - to another user is because the .profile sources a lot of alias/environmental variables, and is hanging trying to access these.

    Which user account are you using?
    Have you tested su behaviour with a clean user? One with no additional aliases and with default PATH. What is the PATH of the account you are testing from? Are non-default entries ahead of system defaults in the PATH? Are all PATH entries valid? What about LIBPATH - I've seen failure to resolve valid libraries when there was an invalid entry in LIBPATH.
    Are aliases sourced from /etc/profile or individual user profiles? I personally dislike setting aliases and changing evironment variables for all users regardless of whether it is required or not.
    Have you tried truss su - whoever
    If there is a significant wait somewhere, this might help you track it down.
    Regards, Simon


  • 9.  Re: Strange Alias related behaviour

    Posted Wed October 21, 2009 08:59 AM

    Originally posted by: Porktree


    Thanks for the input. I solved this last night, it turned out to be a bug in Tivoli 6.1, local journaling was turned on (instead of using the TSM database), and that apparently sucked up kernel memory. So, everytime this filesystem backup ran after about 2 hours it had totally hosed things. The filesystem I'm backing up has about 6M files and we'd gone to local journaling to speed things up. (If I remember right this was introduced in 5.5 to AIX and had been a windows client setting before).