AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.


#Power
#Power
 View Only
  • 1.  Warning: system problems due to DNS server failure

    Posted Mon April 19, 2010 11:43 PM

    Originally posted by: SystemAdmin


    Hi,

    I am writing this as a cautionary note to all AIX administrators, who rely on other groups within their organisation to provide DNS.

    Two weeks ago, five of our production servers became temporarily inoperable. Telnet was working, ssh connections took roughly 45 seconds, but users with specialised connections such as forks and talkmans were unable to work.

    It turned out that one of our DNS servers was down; not the primary DNS server, but the secondary DNS server.

    One could see that this was a DNS problem, since "ssh IP_address" and "ssh hostname.domain" both worked within seconds; but "ssh hostname" took around 45 seconds to connect.

    The problem is supposed to be the testing for an IPv6 address in AIX (even though we are only running IPv4). Anyway the solution is to this problem is ensure that the /etc/netsvc.conf contains the line:

    hosts=local,bind4

    I am adding this message as a warning to other AIX administrators. You can check whether this would affect your systems by changing the secondary nameserver in /etc/resolv.conf to a dummy IP address. But make sure you do this on a development server, obviously.
    #AIX-Forum


  • 2.  Re: Warning: system problems due to DNS server failure

    Posted Tue April 20, 2010 04:38 PM

    Originally posted by: Juredd1


    I hope you don't mind a little discussion on this subject.

    I am not sure why it was even trying to hitting your second DNS server in the list unless it failed to get a reply from the first one. If I understood correctly that it was the second DNS server in the list, your secondary DNS server that was down...right?

    I agree the line you suggested to put in the /etc/netsvc.conf file is important but that is only going to help if the hostname that you are trying to ssh to is listed in your /etc/hosts file. If it's not in your /etc/hosts file I am not sure it would make any difference or maybe I am wrong about what I think I know about the netsvc.conf file. Obviously it made a difference for you but not sure why unless you had the hostname listed in the /etc/hosts file.

    If you did not have that host in the /etc/hosts file then I would guess that the box you were trying to ssh to was in the same domain as the box you were connecting from or you had the "search" option added to the /etc/resolv.conf file with the other boxes domain added or I would expect that ssh hostname to fail all together.

    I did test on a dev box. I added a bad IP for the second nameserver in my /etc/resolv.conf file and removed the hosts=local,bind4 entry I had in my /etc/netsvc.conf file. I saw no delay in either connection method.
    #AIX-Forum


  • 3.  Re: Warning: system problems due to DNS server failure

    Posted Tue April 20, 2010 11:57 PM

    Originally posted by: SystemAdmin


    Hi Juredd1,

    We initially had hosts=local,bind in our /etc/netsvc.conf and this appears to be the problem. Although telnet uses /etc/hosts and was not affected, not all connections (ssh, sendmail, and our database) obey the order specified in netsvc.conf.

    I gather that with the "bind" setting that AIX is checking for an IPv6 address, not obtaining this from the first DNS server, and trying the second. When this times out, it uses the IPv4 address.

    We found this characteristic most confusing, since a nslookup of a host was supplied instantly.

    Although we now have the "bind4" setting in /etc/netsvc.conf, at the time of the incident our solution to the problem was to comment out second nameserver (in /etc/resolv.conf). It was only later that we found out about the "bind4" setting. One of our production servers had this setting, and was unaffected by the DNS outage.
    #AIX-Forum


  • 4.  Re: Warning: system problems due to DNS server failure

    Posted Tue April 20, 2010 05:05 PM

    Originally posted by: shargus


    Might also be worth looking into the RES_TIMEOUT and RES_RETRY environment variables. These affect how the system responds to a failed DNS server.

    Our systems have them set for RES_TIMEOUT=3 and RES_RETRY=1. For those values, the server will give the first DNS server 8 seconds to reply, then try the next one in the list.
    #AIX-Forum