AIX Open Source

 View Only
Expand all | Collapse all

Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

Erich Wolz

Erich WolzMon April 11, 2022 09:24 PM

RESHMA KUMAR

RESHMA KUMARTue April 19, 2022 07:10 AM

Erich Wolz

Erich WolzTue April 19, 2022 12:06 PM

Erich Wolz

Erich WolzWed May 04, 2022 05:31 PM

RESHMA KUMAR

RESHMA KUMARTue May 17, 2022 01:08 AM

Erich Wolz

Erich WolzTue May 17, 2022 04:11 PM

Erich Wolz

Erich WolzFri May 27, 2022 06:39 PM

Ayappan P

Ayappan PThu June 02, 2022 10:58 AM

Erich Wolz

Erich WolzThu June 02, 2022 11:48 AM

Ayappan P

Ayappan PFri June 03, 2022 03:12 AM

Erich Wolz

Erich WolzTue July 26, 2022 10:38 AM

SANKET RATHI

SANKET RATHITue July 26, 2022 12:23 PM

Erich Wolz

Erich WolzThu September 08, 2022 10:32 PM

Erich Wolz

Erich WolzFri September 09, 2022 07:28 PM

Ayappan P

Ayappan PThu September 15, 2022 10:10 AM

Erich Wolz

Erich WolzThu September 15, 2022 10:44 AM

RESHMA KUMAR

RESHMA KUMARThu September 15, 2022 01:34 PM

Erich Wolz

Erich WolzWed September 21, 2022 10:58 AM

Erich Wolz

Erich WolzTue September 27, 2022 04:34 PM

Erich Wolz

Erich WolzThu September 29, 2022 03:07 PM

Erich Wolz

Erich WolzWed October 19, 2022 02:24 PM

Erich Wolz

Erich WolzFri October 21, 2022 06:19 PM

Stephen Ulmer

Stephen UlmerFri October 21, 2022 11:44 PM

Stephen Ulmer

Stephen UlmerFri October 21, 2022 11:48 PM

Erich Wolz

Erich WolzFri October 28, 2022 01:16 PM

Stephen Ulmer

Stephen UlmerSat October 29, 2022 03:08 PM

Erich Wolz

Erich WolzMon October 31, 2022 04:13 PM

SANKET RATHI

SANKET RATHISun October 30, 2022 12:14 PM

Erich Wolz

Erich WolzMon October 31, 2022 07:23 PM

Stephen Ulmer

Stephen UlmerWed November 02, 2022 09:17 AM

SANKET RATHI

SANKET RATHIWed November 02, 2022 03:13 PM

SANKET RATHI

SANKET RATHIWed November 02, 2022 03:14 PM

SANKET RATHI

SANKET RATHIWed November 02, 2022 03:20 PM

Ayappan P

Ayappan PThu November 03, 2022 09:29 AM

Erich Wolz

Erich WolzThu November 03, 2022 09:12 PM

Ayappan P

Ayappan PFri November 04, 2022 03:22 AM

Erich Wolz

Erich WolzFri November 04, 2022 09:53 AM

Erich Wolz

Erich WolzThu November 10, 2022 11:50 AM

Erich Wolz

Erich WolzTue December 13, 2022 04:00 AM

  • 1.  Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu May 06, 2021 10:05 AM
    We have installed Nagios 4.4.6 using the following AIX Toolbox RPM

    bash-5.0# rpm -qa | grep -i nagios
    nagios-4.4.6-1.ppc
    nagios-plugins-2.3.3-1.ppc

    First question is, is there anybody on this forum who has managed to successfully start Nagios 4.4.6 on AIX? If yes it would be great if you could provide a quick feedback.

    When we start the Nagios server we see the following errors.

    bash-5.0# tail -f /var/log/nagios/nagios.log
    [1620308968] Successfully shutdown... (PID=12059090)
    [1620308993] Nagios 4.4.6 starting... (PID=13435208)
    [1620308993] Local time is Thu May 06 15:49:53 MEsT 2021
    [1620308993] LOG VERSION: 2.0
    [1620308993] qh: Socket '/var/nagios/rw/nagios.qh' successfully initialized
    [1620308993] qh: core query handler registered
    [1620308993] qh: echo service query handler registered
    [1620308993] qh: help for the query handler registered
    [1620308993] wproc: Successfully registered manager as @wproc with query handler
    [1620308996] Successfully launched command file worker with pid 14942638
    [1620309018] Unable to run check for service 'Swap Usage' on host 'localhost'
    [1620309033] Unable to run check for service 'SSH' on host 'localhost'

    bash-5.0# tail -f /var/nagios/nagios.log
    [1620308754] Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!
    [1620308754] Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!
    [1620308754] Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!

    We can rule out a permission problem because we also tried to set directory as well as the file to global rwx. We also set all limits in /etc/security/limits to unlimited.

    It looks like Nagios is not able to spawn any worker processes. That is something which was introduced in Nagios 4 (we tried Nagios 3 as well which worked fine).

    We compared that to an x86 based installation and there we can see that Nagios is launching the worker processes using the follwing command
    /opt/freeware/bin/nagios --worker /var/nagios/rw/nagios.qh

    If we run the same command on AIX we get an:
    /opt/freeware/bin/nagios: illegal option -- -
    /opt/freeware/bin/nagios: illegal option -- w
    /opt/freeware/bin/nagios: illegal option -- o
    /opt/freeware/bin/nagios: illegal option -- r
    /opt/freeware/bin/nagios: illegal option -- k
    /opt/freeware/bin/nagios: illegal option -- e
    /opt/freeware/bin/nagios: illegal option -- r

    According the manpage the --woker option should be the same like -W option. But if we launch the command like this
    /opt/freeware/bin/nagios -W /var/nagios/rw/nagios.qh

    We see the error
    Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading! that is also in the logfile



    ------------------------------
    Oliver Stadler
    ------------------------------


  • 2.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Fri May 07, 2021 04:13 PM

    I've not managed to successfully start Nagios 4.4.6 on AIX v7, but back in the day I was able to build/run Nagios 3.1.2 on AIX v6... and am hoping to be able to migrate to a Nagios instance based on these AIX Toolbox RPMs:
    nagios-4.4.6-1.ppc
    nagios-gui-4.4.6-1.ppc
    nagios-nrpe-4.0.3-1.ppc
    nagios-plugins-2.3.3-1.ppc

    I, too, am seeing behavior similar to the above:
    # cat /var/log/nagios/nagios.log
    [1620416941] Nagios 4.4.6 starting... (PID=11010436)
    [1620416941] Local time is Fri May 07 14:49:01 CDT 2021
    [1620416941] LOG VERSION: 2.0
    [1620416941] qh: Socket '/var/nagios/rw/nagios.qh' successfully initialized
    [1620416941] qh: core query handler registered
    [1620416941] qh: echo service query handler registered
    [1620416941] qh: help for the query handler registered
    [1620416941] wproc: Successfully registered manager as @wproc with query handler
    [1620416944] Successfully launched command file worker with pid 14483798
    [1620416944] Unable to send check for host 'localhost' to worker (ret=-2)
    [1620416981] Unable to run check for service 'Current Load' on host 'localhost'


    # cat /var/nagios/nagios.log
    [1620416941] Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!
    [1620416941] Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!
    [1620416941] Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!
    [1620416941] Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!
    [1620416941] Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!
    [1620416941] Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!
    [1620416941] Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!
    [1620416941] Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!
    [1620416941] Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!
    [1620416941] Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!
    [1620416941] Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!
    [1620416941] Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!

    # ps -ef | grep nagios
    nagios 9306574 11010436 0 0:00 <defunct>
    nagios 9699828 11010436 0 0:00 <defunct>
    nagios 10027426 11010436 0 0:00 <defunct>
    nagios 11010436 1 0 14:49:01 - 0:00 /opt/freeware/bin/nagios -d /etc/nagios/nagios.cfg
    nagios 11141526 11010436 0 0:00 <defunct>
    nagios 11469210 11010436 0 0:00 <defunct>
    nagios 11928062 11010436 0 0:00 <defunct>
    nagios 12386766 11010436 0 0:00 <defunct>
    nagios 13304260 11010436 0 0:00 <defunct>
    nagios 13369786 11010436 0 0:00 <defunct>
    nagios 13435340 11010436 0 0:00 <defunct>
    nagios 13500860 11010436 0 0:00 <defunct>
    nagios 14483798 11010436 0 14:49:04 - 0:00 /opt/freeware/bin/nagios -d /etc/nagios/nagios.cfg
    nagios 14549382 11010436 0 0:00 <defunct>

    ​I of course hoped to get Nagios 4.4.6 running, and just migrate in my old config files... but I see it won't be that simple.

    ------------------------------
    Erich Wolz
    ------------------------------



  • 3.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Wed May 12, 2021 08:59 AM
    Hi,

    Just an observation:

    In /var/log/nagios/nagios.log it shows:
    [1620308993] qh: Socket '/var/nagios/rw/nagios.qh' successfully initialized

    But in /var/nagios/nagios.log it shows:
    [1620308754] Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!



    ------------------------------
    Zaki Jääskeläinen
    ------------------------------



  • 4.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Wed May 12, 2021 12:08 PM

    re: "Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!" -- I saw that, too... but how can this be, if the Nagios config file was specified as follows:

    /opt/freeware/bin/nagios -d /etc/nagios/nagios.cfg > /dev/console 2>&1



    ------------------------------
    Erich Wolz
    ------------------------------



  • 5.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Wed September 22, 2021 11:14 PM

    Getting the Nagios 4.4.6 RPMs from the AIX Toolbox up and running on AIX 7.2.3.5 has had to take a back seat for awhile, but I again found time to look at this.  After copying my Nagios 3.x config files over and tweaking them so that /opt/freeware/bin/nagios -v /etc/nagios/nagios.cfg reports no warnings, no errors, and no serious problems detected during the pre-flight check, I saw the following in the nagios.log 

    [1632363052] Error: failed to access() /opt/freeware/bin/nagios: Permission denied
    [1632363052] Error: Spawning workers will be impossible. Aborting.

    The nagios command was owned by root:system and its permissions were 0750; by changing the ownership to nagios:nagios and restarting, I got to the point of seeing the following in the nagios.log 

    [1632363795] Nagios 4.4.6 starting... (PID=17301766)
    [1632363795] Local time is Wed Sep 22 21:23:15 CDT 2021
    [1632363795] LOG VERSION: 2.0
    [1632363795] qh: Socket '/var/nagios/rw/nagios.qh' successfully initialized
    [1632363795] qh: core query handler registered
    [1632363795] qh: echo service query handler registered
    [1632363795] qh: help for the query handler registered
    [1632363795] wproc: Successfully registered manager as @wproc with query handler
    [1632363797] Successfully launched command file worker with pid 25690398

    However, the rest of the log file consists of "Unable to send check for host <hostname> to worker (ret=-2)" and "Unable to run check for service <servicename> on host <hostname>" messages.

    I'm no longer seeing the "Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!" messages in the nagios.log  like I did back in May... but unfortunately I'm still seeing them in the syslog.  Ownership/permissions of this file are

    $ ls -l /var/nagios/rw/nagios.qh
    srwxrwxrwx 1 nagios nagiscmd 0 Sep 22 21:53 /var/nagios/rw/nagios.qh

    A bit of googling turned up this URL (though the problem occurred on SELinux not AIX): https://stackoverflow.com/questions/42777659/nagios-unable-to-send-check-for-host-or-run-check-for-service?rq=1, specifically the following comment: "No, a bad policy is causing the issues, not selinux. The policy was apparently not updated in the EPEL package when updating to nagios-4.3.2"

    Any chance something similar can be going on with AIX?



    ------------------------------
    Erich Wolz
    ------------------------------



  • 6.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu September 23, 2021 11:30 AM

    We didn't encounter these issues in our environment. 
    The permissions are properly set in the spec file -- > https://public.dhe.ibm.com/aix/freeSoftware/aixtoolbox/SPECS/nagios-4.4.6-1.spec

    The "Error:Cannot open .." is also not seen. Probably some older nagios processes are still running in the system. 



    ------------------------------
    Ayappan P
    ------------------------------



  • 7.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu September 23, 2021 11:57 AM

    > The permissions are properly set in the spec file

    The only "chmod" commands I see in the above spec file (to set permissions) are the following (I do not see any chown commands):

    chmod 0755 config.status

    chmod 0755 ${RPM_BUILD_ROOT}/etc/rc.d/init.d/%{name}

    > Probably some older nagios processes are still running in the system. 

    Nope:

    [root@hl1axmon:/var/log/nagios] >ps -ef | grep nagios
    [root@hl1axmon:/var/log/nagios] >nohup /opt/freeware/bin/nagios -d /etc/nagios/nagios.cfg > /dev/console 2>&1 &
    [1] 24510746
    [1] + Done nohup /opt/freeware/bin/nagios -d /etc/nagios/nagios.cfg > /dev/console 2>&1 &
    [root@hl1axmon:/var/log/nagios] >ps -ef| grep nagios
    nagios 8782330 19726594 0 0:00 <defunct>
    nagios 9699664 19726594 0 0:00 <defunct>
    nagios 10027378 19726594 0 0:00 <defunct>
    nagios 11141512 19726594 0 0:00 <defunct>
    nagios 11338124 19726594 0 0:00 <defunct>
    nagios 13762998 19726594 0 0:00 <defunct>
    nagios 14483880 19726594 0 0:00 <defunct>
    nagios 16777658 19726594 0 0:00 <defunct>
    nagios 19726594 1 0 10:43:19 - 0:00 /opt/freeware/bin/nagios -d /etc/nagios/nagios.cfg
    nagios 19923402 19726594 0 10:43:22 - 0:00 /opt/freeware/bin/nagios -d /etc/nagios/nagios.cfg
    nagios 20906282 19726594 0 0:00 <defunct>
    nagios 21889462 19726594 0 0:00 <defunct>
    nagios 24314338 19726594 0 0:00 <defunct>
    nagios 26280370 19726594 0 0:00 <defunct>
    [root@hl1axmon:/var/log/nagios] >cat nagios.log
    [1632411799] Nagios 4.4.6 starting... (PID=19726594)
    [1632411799] Local time is Thu Sep 23 10:43:19 CDT 2021
    [1632411799] LOG VERSION: 2.0
    [1632411799] qh: Socket '/var/nagios/rw/nagios.qh' successfully initialized
    [1632411799] qh: core query handler registered
    [1632411799] qh: echo service query handler registered
    [1632411799] qh: help for the query handler registered
    [1632411799] wproc: Successfully registered manager as @wproc with query handler
    [1632411802] Successfully launched command file worker with pid 19923402
    [1632411802] Unable to send check for host <hostname1> to worker (ret=-2)
    [1632411803] Unable to send check for host <hostname2> to worker (ret=-2)
    [1632411804] Unable to send check for host <hostname3> to worker (ret=-2)
    (etc.)

    and in the syslog:

    Sep 23 10:43:19 hl1axmon user:info syslog: Nagios 4.4.6 starting... (PID=19726594)
    Sep 23 10:43:19 hl1axmon user:info syslog: Local time is Thu Sep 23 10:43:19 CDT 2021
    Sep 23 10:43:19 hl1axmon user:info syslog: LOG VERSION: 2.0
    Sep 23 10:43:19 hl1axmon user:info syslog: qh: Socket '/var/nagios/rw/nagios.qh' successfully initialized
    Sep 23 10:43:19 hl1axmon user:info syslog: qh: core query handler registered
    Sep 23 10:43:19 hl1axmon user:info syslog: qh: echo service query handler registered
    Sep 23 10:43:19 hl1axmon user:info syslog: qh: help for the query handler registered
    Sep 23 10:43:19 hl1axmon user:info syslog: wproc: Successfully registered manager as @wproc with query handler
    Sep 23 10:43:19 hl1axmon user:info syslog: Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!
    Sep 23 10:43:19 hl1axmon user:info last message repeated 11 times
    Sep 23 10:43:22 hl1axmon user:info syslog: Successfully launched command file worker with pid 19923402
    Sep 23 10:43:22 hl1axmon user:info syslog: Unable to send check for host <hostname1> to worker (ret=-2)
    (etc.)

    On a working monitor still running Nagios 3.x there are no defunct processes.  Instead there are a bunch of spawned "check_ping" commands and the like.



    ------------------------------
    Erich Wolz
    ------------------------------



  • 8.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue September 28, 2021 09:51 AM

    > We didn't encounter these issues in our environment. 

    Are you actually using Nagios to monitor your environment, or are you just building Nagios without actually trying to use it?

    If the former, then there must be some difference between your environment and mine, and it would help to know what that is :-)

    What is your oslevel -s?  Your lsdev | grep Available?  Your rpm -qa | sort?

    Up until last week, my older Nagios 3.x was running on IBM HTTP Server; I recently migrated to the AIX Toolbox version of apache to eliminate that as a variable (i.e. my older Nagios 3.x runs on Toolbox apache, but  my Toolbox Nagios does not... so it would appear that apache itself is not the source of the problem).  



    ------------------------------
    Erich Wolz
    ------------------------------



  • 9.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Mon October 04, 2021 09:48 AM
    Ping

    ------------------------------
    Erich Wolz
    ------------------------------



  • 10.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue October 05, 2021 10:27 AM
      |   view attached
    # oslevel -s
    7200-04-02-2028

    # lsdev | grep Available
    L2cache0 Available L2 Cache
    cluster0 Available Cluster Node
    dac0 Available 17-T1-01 DS3/4/5K PCM User Interface
    dac1 Available 17-T1-01 DS3/4/5K PCM User Interface
    en0 Available Standard Ethernet Network Interface
    ent0 Available Virtual I/O Ethernet Adapter (l-lan)
    fcs0 Available 17-T1 Virtual Fibre Channel Client Adapter
    fscsi0 Available 17-T1-01 FC SCSI I/O Controller Protocol Device
    hdisk0 Available Virtual SCSI Disk Drive
    hdisk1 Available 17-T1-01 MPIO DS5100/5300 Disk
    hdisk2 Available 17-T1-01 MPIO DS5100/5300 Disk
    inet0 Available Internet Network Extension
    iscsi0 Available iSCSI Protocol Device
    lo0 Available Loopback Network Interface
    loop0 Available Loopback Device
    lvdd Available LVM Device Driver
    mem0 Available Memory
    pkcs11 Available ACF/PKCS#11 Device
    proc0 Available 00-00 Processor
    proc4 Available 00-04 Processor
    proc8 Available 00-08 Processor
    proc12 Available 00-12 Processor
    pty0 Available Asynchronous Pseudo-Terminal
    sfw0 Available Storage Framework Module
    sfwcomm0 Available 17-T1-01-FF Fibre Channel Storage Framework Comm
    sys0 Available System Object
    sysplanar0 Available System Planar
    vio0 Available Virtual I/O Bus
    vsa0 Available LPAR Virtual Serial Adapter
    vscsi0 Available Virtual SCSI Client Adapter
    vty0 Available Asynchronous Terminal

    Please find the attached document for the output of rpm -qa | sort

    ------------------------------
    RESHMA KUMAR
    ------------------------------

    Attachment(s)

    txt
    rpm-qa-output.txt   20 KB 1 version


  • 11.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue October 05, 2021 12:03 PM
      |   view attached

    Thanks for the above command outputs.  However, I can't tell whether you are actually using Nagios to monitor your environment, or you are just building Nagios without actually trying to run it?

    First difference between our environments:  my oslevel (7200-03-05-2016)

    Devices you have that I don't have (not sure if any of these would account for the issues I'm seeing):

    < dac0 Available 17-T1-01 DS3/4/5K PCM User Interface
    < dac1 Available 17-T1-01 DS3/4/5K PCM User Interface
    < fcs0 Available 17-T1 Virtual Fibre Channel Client Adapter
    < fscsi0 Available 17-T1-01 FC SCSI I/O Controller Protocol Device
    < hdisk1 Available 17-T1-01 MPIO DS5100/5300 Disk
    < hdisk2 Available 17-T1-01 MPIO DS5100/5300 Disk
    < loop0 Available Loopback Device
    < proc4 Available 00-04 Processor
    < proc12 Available 00-12 Processor
    < sfwcomm0 Available 17-T1-01-FF Fibre Channel Storage Framework Comm

    Devices I have that you don't have (pretty sure none of these would account for the issues I'm seeing):

    > cd0 Available Virtual SCSI Optical Served by VIO Server
    > hdisk1 Available Virtual SCSI Disk Drive
    > vscsi1 Available Virtual SCSI Client Adapter

    RPMs you have that I don't have (see attached file "rpmdiff.txt")

    RPMs I have that you don't have:

    > curl-7.76.1-1.ppc
    > mod_perl-2.0.11-2.ppc
    > mod_ssl-2.4.48-1.ppc
    > ncurses-6.2-2.ppc
    > php-7.4.22-1.ppc
    > python-pycurl-7.43.0-1.ppc
    > python-urlgrabber-3.10.1-1.noarch
    > sudo-1.9.5p2-1.ppc
    > yum-3.4.3-8.noarch

    As I am not familiar with many (if not most) of the RPMs you have installed, could one of them perhaps account for the fact that I am seeing "Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!" and "Unable to send check for host <hostname> to worker (ret=-2)" messages in my syslog?  If so, it should probably be noted as a prereq.



    ------------------------------
    Erich Wolz
    ------------------------------

    Attachment(s)

    txt
    rpmdiff.txt   19 KB 1 version


  • 12.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Wed October 06, 2021 05:15 PM

    Also still wondering why the syslog shows 

    syslog: Nagios 4.4.6 starting... (PID=13762976)
    syslog: Local time is Wed Oct 06 16:00:54 CDT 2021
    syslog: LOG VERSION: 2.0
    syslog: qh: Socket '/var/nagios/rw/nagios.qh' successfully initialized
    syslog: qh: core query handler registered
    syslog: qh: echo service query handler registered
    syslog: qh: help for the query handler registered
    syslog: wproc: Successfully registered manager as @wproc with query handler
    syslog: Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!

    when the nagios config file is specified as follows in /etc/inittab:

    /opt/freeware/bin/nagios -d /etc/nagios/nagios.cfg > /dev/console 2>&1

    /etc/nagios/nagios.cfg is a file; /var/nagios/rw/nagios.qh is a socket.  I don't know where the problem is, but clearly something's not being parsed correctly.

    $ ls -l /etc/nagios/nagios.cfg /var/nagios/rw/nagios.qh
    -rw-rw-r-- 1 nagios nagios 45626 Sep 22 20:31 /etc/nagios/nagios.cfg
    srwxrwxrwx 1 nagios nagiscmd 0 Oct 06 16:00 /var/nagios/rw/nagios.qh



    ------------------------------
    Erich Wolz
    ------------------------------



  • 13.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu October 07, 2021 06:07 PM
    Is there a particular debug_level I should set, to see why the AIX Toolbox nagios 4.4.6 thinks the main configuration file is the socket '/var/nagios/rw/nagios.qh' and not the file '/etc/nagios/nagios.cfg' that is being specified on the command line?

    ------------------------------
    Erich Wolz
    ------------------------------



  • 14.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Mon October 11, 2021 12:17 PM

    Ping.

    From the nagios log itself, it would appear that the config file is (correctly) found to be /etc/nagios/nagios.cfg -- given that the config file specifies the socket to use for the query handler interface, and this socket is successfully initialized:

    # grep query_socket /etc/nagios/nagios.cfg
    query_socket=/var/nagios/rw/nagios.qh

    # head -14 /var/log/nagios/nagios.log
    [1633650391] Nagios 4.4.6 starting... (PID=16777578)
    [1633650391] Local time is Thu Oct 07 18:46:31 CDT 2021
    [1633650391] LOG VERSION: 2.0
    [1633650391] qh: Socket '/var/nagios/rw/nagios.qh' successfully initialized
    [1633650391] qh: core query handler registered
    [1633650391] qh: echo service query handler registered
    [1633650391] qh: help for the query handler registered
    [1633650391] wproc: Successfully registered manager as @wproc with query handler
    [1633650394] Successfully launched command file worker with pid 13763004
    [1633650394] Unable to send check for host <hostname> to worker (ret=-2)

    ​However, once the query handler has been initialized/registered, the nagios server is trying to read from the socket -- and not the config file -- for further config info (I'm guessing this may be due to the config file handle not having been restored properly after the name of the socket was determined):

    # grep "Oct 7 18:46:3" /syslogs/syslog
    Oct 7 18:46:31 hl1axmon user:info syslog: Nagios 4.4.6 starting... (PID=16777578)
    Oct 7 18:46:31 hl1axmon user:info syslog: Local time is Thu Oct 07 18:46:31 CDT 2021
    Oct 7 18:46:31 hl1axmon user:info syslog: LOG VERSION: 2.0
    Oct 7 18:46:31 hl1axmon user:info syslog: qh: Socket '/var/nagios/rw/nagios.qh' successfully initialized
    Oct 7 18:46:31 hl1axmon user:info syslog: qh: core query handler registered
    Oct 7 18:46:31 hl1axmon user:info syslog: qh: echo service query handler registered
    Oct 7 18:46:31 hl1axmon user:info syslog: qh: help for the query handler registered
    Oct 7 18:46:31 hl1axmon user:info syslog: wproc: Successfully registered manager as @wproc with query handler
    Oct 7 18:46:31 hl1axmon user:info syslog: Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!
    Oct 7 18:46:31 hl1axmon user:info last message repeated 11 times
    Oct 7 18:46:34 hl1axmon user:info syslog: Successfully launched command file worker with pid 13763004
    Oct 7 18:46:34 hl1axmon user:info syslog: Unable to send check for host <hostname> to worker (ret=-2)



    ------------------------------
    Erich Wolz
    ------------------------------



  • 15.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu October 14, 2021 10:37 PM
    Ping.

    From https://stackoverflow.com/questions/42777659/nagios-unable-to-send-check-for-host-or-run-check-for-service

    ...With NagiosSupport's help, we found out that it was SELinux that was in the "enforcing" mode (and/or a bad policy that was not updated in the EPEL package when updating Nagios) that was causing the issues....

    Of course, this was on a Linux server not an AIX server... but sure enough, https://serverfault.com/questions/894349/nagios-on-rhel-6-epel-package-stopped-working-after-update seems to corroborate that a change to something in the EPEL package for Nagios on RedHat Enterprise Linux was to blame.

    Since (in addition to "Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!") I am also seeing "Unable to send check for host" and "Unable to run check for service" messages, might there be an AIX equivalent to the EPEL package for Nagios on RHEL that is preventing these checks from running and/or causing the nagios daemon to "reset" the value of the main configuration file?

    ------------------------------
    Erich Wolz
    ------------------------------



  • 16.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Mon October 18, 2021 07:00 PM

    Ping.

    Also, for what it's worth... the Health Check policy we use wants each active inittab entry's file/command/script executable -- and all existing directories in its path -- to have settings for "group" and "other" of r-x or more stringent, if owned by a group not considered as privileged (which the "nagios" group is not, by default).  However, changing the ownership of /opt/freeware/bin/nagios to root:system (and the permissions to 0700, 0750, or 0755) results in the following messages in the syslog:

    Error: failed to access() /opt/freeware/bin/nagios: Permission denied
    Error: Spawning workers will be impossible. Aborting.

    My existing Nagios 3.1.2 executable (built on AIX 6.1 years ago) is owned by root:system and has permissions of 0700, and does not have this problem.  Needless to say, I changed the ownership of /opt/freeware/bin/nagios back to nagios:nagios and the permissions back to 0774... but I am still left with the fact that there's something in the Nagios 4.4.6 source code that is causing the name of the main config file to be changed from from '/etc/nagios/nagios.cfg' to '/var/nagios/rw/nagios.qh' at some point after initialization -- and this is (clearly) the more important issue.



    ------------------------------
    Erich Wolz
    ------------------------------



  • 17.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Wed October 27, 2021 09:54 AM
    Ping

    ------------------------------
    Erich Wolz
    ------------------------------



  • 18.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Fri October 29, 2021 01:30 AM
    Thanks for sharing the details. We are looking into the issue.

    ------------------------------
    RESHMA KUMAR
    ------------------------------



  • 19.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Wed November 17, 2021 11:31 AM
    Anything to report?

    ------------------------------
    Erich Wolz
    ------------------------------



  • 20.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Wed November 24, 2021 06:52 AM
    We are working on it

    ------------------------------
    RESHMA KUMAR
    ------------------------------



  • 21.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Fri November 26, 2021 12:45 AM
    We were able to fix the following error with a patch.
    Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!

    We are checking "Unable to run check for service " errors now.

    ------------------------------
    RESHMA KUMAR
    ------------------------------



  • 22.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Mon November 29, 2021 10:31 AM
    That is indeed welcome news!  Looking forward to the availability of the patch.

    ------------------------------
    Erich Wolz
    ------------------------------



  • 23.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue December 14, 2021 02:13 AM
    We have fixed the issue and uploaded nagios-4.4.6-2.
    https://public.dhe.ibm.com/aix/freeSoftware/aixtoolbox/RPMS/ppc/nagios/

    Before starting nagios server, please increase the ulimit value of nofiles.

    ------------------------------
    RESHMA KUMAR
    ------------------------------



  • 24.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue December 14, 2021 11:10 AM
    That is indeed good news!  What is the recommended value for nofiles?

    ------------------------------
    Erich Wolz
    ------------------------------



  • 25.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Wed December 15, 2021 02:51 AM
    Try "ulimit -n 20000"

    ------------------------------
    Ayappan P
    ------------------------------



  • 26.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu December 16, 2021 11:30 AM

    After updating the nagios and nagios-gui packages and setting nofiles=20000 I no longer get the "Error: Cannot open main configuration file '/var/nagios/rw/nagios.qh' for reading!" messages in nagios.log... but I still get "Unable to send check for host" and "Unable to run check for service" messages:

    [root@hl1axmon:/var/log/nagios] >cat nagios.log
    [1639671588] Nagios 4.4.6 starting... (PID=14025042)
    [1639671588] Local time is Thu Dec 16 10:19:48 CST 2021
    [1639671588] LOG VERSION: 2.0
    [1639671588] qh: Socket '/var/nagios/rw/nagios.qh' successfully initialized
    [1639671588] qh: core query handler registered
    [1639671588] qh: echo service query handler registered
    [1639671588] qh: help for the query handler registered
    [1639671588] wproc: Successfully registered manager as @wproc with query handler
    [1639671588] wproc: Registry request: name=Core Worker 21168520;pid=21168520
    [1639671588] wproc: Registry request: name=Core Worker 13566242;pid=13566242
    [1639671588] wproc: Registry request: name=Core Worker 15663522;pid=15663522
    [1639671588] wproc: Registry request: name=Core Worker 15794538;pid=15794538
    [1639671588] wproc: Registry request: name=Core Worker 5964042;pid=5964042
    [1639671588] wproc: Registry request: name=Core Worker 19399124;pid=19399124
    [1639671588] wproc: Registry request: name=Core Worker 18350506;pid=18350506
    [1639671588] wproc: Registry request: name=Core Worker 19661182;pid=19661182
    [1639671588] wproc: Registry request: name=Core Worker 13828518;pid=13828518
    [1639671588] wproc: Registry request: name=Core Worker 18022714;pid=18022714
    [1639671588] wproc: Registry request: name=Core Worker 16843188;pid=16843188
    [1639671588] wproc: Registry request: name=Core Worker 18481458;pid=18481458
    [1639671588] Successfully launched command file worker with pid 22479158
    [1639671590] wproc: Socket to worker Core Worker 21168520 broken, removing
    [1639671590] wproc: 'Core Worker 15794538' seems to be choked. ret = -1; bufsize = 132: written = 0; errno = 32 (Broken pipe)
    [1639671590] wproc: Socket to worker Core Worker 13566242 broken, removing
    [1639671590] wproc: 'Core Worker 19399124' seems to be choked. ret = -1; bufsize = 132: written = 0; errno = 32 (Broken pipe)
    [1639671590] wproc: Socket to worker Core Worker 15663522 broken, removing
    [1639671590] wproc: Socket to worker Core Worker 15794538 broken, removing
    [1639671590] wproc: Socket to worker Core Worker 5964042 broken, removing
    [1639671590] wproc: Socket to worker Core Worker 19399124 broken, removing
    [1639671590] wproc: Socket to worker Core Worker 18350506 broken, removing
    [1639671590] wproc: Socket to worker Core Worker 19661182 broken, removing
    [1639671590] wproc: Socket to worker Core Worker 13828518 broken, removing
    [1639671590] wproc: Socket to worker Core Worker 18022714 broken, removing
    [1639671590] wproc: Socket to worker Core Worker 16843188 broken, removing
    [1639671590] wproc: Socket to worker Core Worker 18481458 broken, removing

    At this point, the rest of the log consists of "Unable to send check for host" and "Unable to run check for service" messages.



    ------------------------------
    Erich Wolz
    ------------------------------



  • 27.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu December 16, 2021 12:06 PM
    bumping nofiles to 65536 didn't help any

    ------------------------------
    Erich Wolz
    ------------------------------



  • 28.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Fri December 17, 2021 02:02 AM
    For some reason , the workers are getting choked and removed in your machine. Because of that the "unable to check/send" errors are shown in the log. 
    Can you share the output of "ulimit -a" and vmstat ?

    ------------------------------
    Ayappan P
    ------------------------------



  • 29.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue January 04, 2022 06:17 PM
    Sorry, was on vacation and only just saw this request.

    [root@hl1axmon:/] >ulimit -a
    time(seconds) unlimited
    file(blocks) 2097151
    data(kbytes) 131072
    stack(kbytes) 32768
    memory(kbytes) 32768
    coredump(blocks) 2097151
    nofiles(descriptors) 65536
    threads(per process) unlimited
    processes(per user) unlimited
    [root@hl1axmon:/] >prtconf 2>/dev/null | head -15
    System Model: IBM,8286-42A
    Machine Serial Number: 219BDDV
    Processor Type: PowerPC_POWER8
    Processor Implementation Mode: POWER 8
    Processor Version: PV_8_Compat
    Number Of Processors: 2
    Processor Clock Speed: 3525 MHz
    CPU Type: 64-bit
    Kernel Type: 64-bit
    LPAR Info: 13 hl1axmon
    Memory Size: 8192 MB
    Good Memory Size: 8192 MB
    Platform Firmware level: SV860_215
    Firmware Version: IBM,FW860.81 (SV860_215)
    Console Login: enable
    [root@hl1axmon:/] >vmstat

    System configuration: lcpu=8 mem=8192MB ent=0.20

    kthr memory page faults cpu
    ----- ----------- ------------------------ ------------ -----------------------
    r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec
    2 1 593600 624946 0 0 0 0 0 0 2 1301 231 0 0 99 0 0.00 1.2

    ------------------------------
    Erich Wolz
    ------------------------------



  • 30.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Fri January 07, 2022 05:09 AM
    Can you try with two workers by setting "check_workers=2" in the nagios.cfg file ?

    ------------------------------
    Ayappan P
    ------------------------------



  • 31.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Mon January 10, 2022 02:02 PM
    # grep ^check_workers /etc/nagios/nagios.cfg
    check_workers=2
    # cat nagios.log
    [1641841183] Nagios 4.4.6 starting... (PID=14746100)
    [1641841183] Local time is Mon Jan 10 12:59:43 CST 2022
    [1641841183] LOG VERSION: 2.0
    [1641841183] qh: Socket '/var/nagios/rw/nagios.qh' successfully initialized
    [1641841183] qh: core query handler registered
    [1641841183] qh: echo service query handler registered
    [1641841183] qh: help for the query handler registered
    [1641841183] wproc: Successfully registered manager as @wproc with query handler
    [1641841183] wproc: Registry request: name=Core Worker 20447494;pid=20447494
    [1641841183] wproc: Registry request: name=Core Worker 12583238;pid=12583238
    [1641841183] Successfully launched command file worker with pid 9765198
    [1641841185] wproc: Socket to worker Core Worker 20447494 broken, removing
    [1641841185] wproc: 'Core Worker 12583238' seems to be choked. ret = -1; bufsize = 132: written = 0; errno = 32 (Broken pipe)
    [1641841185] wproc: Socket to worker Core Worker 12583238 broken, removing
    [1641841185] wproc: Error: can't get_worker() in fo_reassign_wproc_job
    [1641841185] Unable to send check for host ...​​

    ------------------------------
    Erich Wolz
    ------------------------------



  • 32.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Mon January 10, 2022 03:32 PM
    I should add that this server is an LPAR.  I tried bumping the number of available virtual processors (up to 4) and also varying the number of check_workers (up to 6), with no difference in outcomes.

    ------------------------------
    Erich Wolz
    ------------------------------



  • 33.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue January 11, 2022 04:06 AM
    Not sure what to check now. The code that refer to this error is here --> https://github.com/NagiosEnterprises/nagioscore/blob/eedbaf6cda184309d3f174e76327a88d68978349/base/workers.c#L1192
    A write buffer error.

    ------------------------------
    Ayappan P
    ------------------------------



  • 34.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Wed January 12, 2022 03:42 PM
    With 4 (virtual) processors and 6 workers I am no longer seeing the "Core Worker seems to be choked" messages... but I'm still seeing "Socket to worker Core Worker broken, removing" messages.  Also, where is "bufsize = 132" being set?  I don't see a "132" in my nagios.cfg

    [root@hl1axmon:/var/log/nagios] >prtconf 2>/dev/null | head -15
    System Model: IBM,8286-42A
    Machine Serial Number: 219BDDV
    Processor Type: PowerPC_POWER8
    Processor Implementation Mode: POWER 8
    Processor Version: PV_8_Compat
    Number Of Processors: 4
    Processor Clock Speed: 3525 MHz
    CPU Type: 64-bit
    Kernel Type: 64-bit
    LPAR Info: 13 hl1axmon
    Memory Size: 8192 MB
    Good Memory Size: 8192 MB
    Platform Firmware level: SV860_236
    Firmware Version: IBM,FW860.A2 (SV860_236)
    Console Login: enable
    [root@hl1axmon:/var/log/nagios] >grep ^check_workers /etc/nagios/nagios.cfg
    check_workers=6
    [root@hl1axmon:/var/log/nagios] >cat nagios.log
    [1642019550] Nagios 4.4.6 starting... (PID=10813866)
    [1642019550] Local time is Wed Jan 12 14:32:30 CST 2022
    [1642019550] LOG VERSION: 2.0
    [1642019550] qh: Socket '/var/nagios/rw/nagios.qh' successfully initialized
    [1642019550] qh: core query handler registered
    [1642019550] qh: echo service query handler registered
    [1642019550] qh: help for the query handler registered
    [1642019550] wproc: Successfully registered manager as @wproc with query handler
    [1642019550] wproc: Registry request: name=Core Worker 16581058;pid=16581058
    [1642019550] wproc: Registry request: name=Core Worker 22282564;pid=22282564
    [1642019550] wproc: Registry request: name=Core Worker 11731282;pid=11731282
    [1642019550] wproc: Registry request: name=Core Worker 21103064;pid=21103064
    [1642019550] wproc: Registry request: name=Core Worker 15466998;pid=15466998
    [1642019550] wproc: Registry request: name=Core Worker 20775412;pid=20775412
    [1642019550] Successfully launched command file worker with pid 16187718
    [1642019552] wproc: Socket to worker Core Worker 16581058 broken, removing
    [1642019552] wproc: Socket to worker Core Worker 22282564 broken, removing
    [1642019552] wproc: Socket to worker Core Worker 11731282 broken, removing
    [1642019552] wproc: Socket to worker Core Worker 21103064 broken, removing
    [1642019552] wproc: Socket to worker Core Worker 15466998 broken, removing
    [1642019552] wproc: Socket to worker Core Worker 20775412 broken, removing
    [1642019552] Unable to send check for host ...

    ------------------------------
    Erich Wolz
    ------------------------------



  • 35.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Fri January 21, 2022 11:57 AM
    Any ideas?

    ------------------------------
    Erich Wolz
    ------------------------------



  • 36.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue February 01, 2022 10:59 AM
    ping

    ------------------------------
    Erich Wolz
    ------------------------------



  • 37.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Wed February 02, 2022 03:23 AM
    We are unable to recreate this issue from our side. So it's difficult to debug this.
    After increasing the virtual processors, did you set the ulimit -n to 20000 (or more) ? 
    And did you try the default "nagios.cfg " file that comes with the rpm? without any customizations.

    ------------------------------
    Ayappan P
    ------------------------------



  • 38.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Wed February 02, 2022 12:40 PM

    [root@hl1axmon:/] >ulimit -a
    time(seconds) unlimited
    file(blocks) 2097151
    data(kbytes) 131072
    stack(kbytes) 32768
    memory(kbytes) 32768
    coredump(blocks) 2097151
    nofiles(descriptors) 262144
    threads(per process) unlimited
    processes(per user) unlimited

    I know I tried changing "use_large_installation_tweaks=1" (instead of the default "use_large_installation_tweaks=0") in nagios.cfg, but that did not help.  I do not have an unmodified original nagios.cfg so I can't verify for sure that I didn't make any other changes.

    I should also note that I am able to launch the Nagios 4.4.6 web GUI, but I see the following in /var/log/httpd/error_log (note that this is *after* I tried and failed to start the nagios daemon):

    [Wed Feb 02 11:29:46.178440 2022] [php7:notice] [pid 5505756] [client 9.160.7.167:50212] PHP Notice: Trying to access array offset on value of type bool in /var/www/htdocs/includes/utils.inc.php on line 217, referer: http://hl1axmon.houston.ibm.com/
    [Wed Feb 02 11:29:46.182443 2022] [php7:notice] [pid 5702308] [client 9.160.7.167:50210] PHP Notice: Trying to access array offset on value of type bool in /var/www/htdocs/includes/utils.inc.php on line 217, referer: http://hl1axmon.houston.ibm.com/
    [Wed Feb 02 11:29:46.182546 2022] [php7:notice] [pid 5702308] [client 9.160.7.167:50210] PHP Notice: Undefined index: REMOTE_USER in /var/www/htdocs/main.php on line 29, referer: http://hl1axmon.houston.ibm.com/
    [Wed Feb 02 11:29:46.183527 2022] [php7:notice] [pid 5702308] [client 9.160.7.167:50210] PHP Notice: Trying to access array offset on value of type bool in /var/www/htdocs/includes/utils.inc.php on line 154, referer: http://hl1axmon.houston.ibm.com/
    [Wed Feb 02 11:29:46.926869 2022] [cgi:error] [pid 5702308] [client 9.160.7.167:50210] End of script output before headers: statusjson.cgi, referer: http://hl1axmon.houston.ibm.com/main.php

    I'm pretty sure I did not make any changes to any of the /var/www/htdocs php files:

    [root@hl1axmon:/var/www/htdocs] >find . -name "*.php" -ls
    8527 1 -rw-rw-r-- 1 nagios nagios 536 Dec 9 2020 ./config.inc.php
    16538 11 -rw-rw-r-- 1 1000 1000 10509 Apr 28 2020 ./includes/utils.inc.php
    8538 2 -rw-rw-r-- 1 1000 1000 1943 Apr 28 2020 ./index.php
    8541 10 -rw-rw-r-- 1 1000 1000 9300 Apr 28 2020 ./main.php
    8549 5 -rw-rw-r-- 1 1000 1000 4383 Apr 28 2020 ./map.php
    8551 7 -rw-rw-r-- 1 1000 1000 6193 Apr 28 2020 ./side.php



    ------------------------------
    Erich Wolz
    ------------------------------



  • 39.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Mon February 07, 2022 04:27 PM
    Edited by Erich Wolz Mon February 07, 2022 04:29 PM

    Here are some other system settings which may or may not have any bearing on this issue:

    ------------------------------

    Erich Wolz
    ------------------------------



  • 40.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue February 08, 2022 07:32 AM
    Hi Eric,

    Have you tried it on a fresh AIX system ? Are you seeing issues everywhere or on a specific system ? 
    I request to try on a clean AIX system and see if issue persist.

    ------------------------------
    SANKET RATHI
    ------------------------------



  • 41.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue February 08, 2022 12:13 PM

    This *is* a fresh AIX system, built specifically for the purpose of hosting this Nagios 4.4.6 server.

    A co-worker is seeing similar (if not identical) behavior on his own freshly-built AIX system.



    ------------------------------
    Erich Wolz
    ------------------------------



  • 42.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu February 10, 2022 01:33 AM
    Please attach your nagios.cfg file here.

    ------------------------------
    Ayappan P
    ------------------------------



  • 43.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu February 10, 2022 05:50 PM
      |   view attached
    File uploaded

    ------------------------------
    Erich Wolz
    ------------------------------

    Attachment(s)

    cfg
    nagios.cfg   44 KB 1 version


  • 44.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Mon February 21, 2022 01:18 PM
    Any ideas?

    ------------------------------
    Erich Wolz
    ------------------------------



  • 45.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu March 03, 2022 09:17 PM
    Ping.

    Out of curiosity, do you have a working Nagios 4.4.6 server running?  If so, what (if any) differences are there between your system settings/nagios.cfg file and mine?

    ------------------------------
    Erich Wolz
    ------------------------------



  • 46.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue March 08, 2022 12:01 AM
    Can you enable debug_level=-1 in nagios.cfg file and share the debug log( /var/log/nagios/nagios.debug)?

    ------------------------------
    RESHMA KUMAR
    ------------------------------



  • 47.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue March 08, 2022 03:34 PM
    Edited by Erich Wolz Tue March 08, 2022 03:35 PM
      |   view attached

    I launched Nagios after setting debug=-1 and (once I started seeing "Unable to send check for host" messages in nagios.log) killed the nagios daemon.  

    Nagios logs attached



    ------------------------------
    Erich Wolz
    ------------------------------

    Attachment(s)

    tar
    nagios_debug.tar   1.82 MB 1 version


  • 48.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Fri March 18, 2022 01:07 PM
    Any ideas?

    ------------------------------
    Erich Wolz
    ------------------------------



  • 49.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Mon March 28, 2022 05:36 PM
    Ping

    ------------------------------
    Erich Wolz
    ------------------------------



  • 50.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu March 31, 2022 01:04 AM
    We were not able to find out much information even from the debug logs.

    ------------------------------
    RESHMA KUMAR
    ------------------------------



  • 51.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu March 31, 2022 11:24 AM

    Do you actually have a running Nagios 4.6 monitor? If so, what are its system specs (i.e. same outputs I was asked to provide for my system)?

    Or (as I suspect) does your team just build all the various open source code packages on AIX and hope they work?

    This thread has been ongoing since May 06, 2021; I am as hopeful as anyone that this can be brought to a successful conclusion sooner rather than later.  Is there anything else you'd like me to try?



    ------------------------------
    Erich Wolz
    ------------------------------



  • 52.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Fri April 01, 2022 05:45 AM

    We are able to launch nagios in our environment. We always test the packages before uploading them to AIX Toolbox.

    Following are the system specifications

    # ulimit -a
    time(seconds) unlimited
    file(blocks) 2097151
    data(kbytes) 131072
    stack(kbytes) 32768
    memory(kbytes) 32768
    coredump(blocks) 2097151
    nofiles(descriptors) 20000
    threads(per process) unlimited
    processes(per user) 128

    # oslevel -s
    7200-00-01-1543

    # prtconf
    System Model: IBM,8284-22A
    Machine Serial Number: 1005E6V
    Processor Type: PowerPC_POWER8
    Processor Implementation Mode: POWER 8
    Processor Version: PV_8_Compat
    Number Of Processors: 8
    Processor Clock Speed: 3425 MHz
    CPU Type: 64-bit
    Kernel Type: 64-bit
    LPAR Info: 3 pokndd3_aix7.2
    Memory Size: 8192 MB
    Good Memory Size: 8192 MB
    Platform Firmware level: TV840_028
    Firmware Version: IBM,FW840.00 (TV840_028)
    Console Login: enable
    Auto Restart: true
    Full Core: true
    NX Crypto Acceleration: Capable and Enabled
    In-Core Crypto Acceleration: Capable, but not Enabled

    Network Information
    Host Name: pokndd3.pok.stglabs.ibm.com
    IP Address: 9.47.66.170
    Sub Netmask: 255.255.240.0
    Gateway: 9.47.79.254
    Name Server:
    Domain Name:

    Paging Space Information
    Total Paging Space: 1024MB
    Percent Used: 2%

    Volume Groups Information
    ==============================================================================
    Active VGs
    ==============================================================================
    rootvg:
    PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
    hdisk0 active 479 0 00..00..00..00..00
    hdisk1 active 799 381 160..00..00..61..160
    ==============================================================================

    INSTALLED RESOURCE LIST

    The following resources are installed on the machine.
    +/- = Added or deleted from Resource List.
    * = Diagnostic support not available.

    Model Architecture: chrp
    Model Implementation: Multiple Processor, PCI bus

    + sys0 System Object
    + sysplanar0 System Planar
    * vio0 Virtual I/O Bus
    * vscsi0 U8284.22A.1005E6V-V3-C3-T1 Virtual SCSI Client Adapter
    * hdisk1 U8284.22A.1005E6V-V3-C3-T1-L8200000000000000 Virtual SCSI Disk Drive
    * hdisk0 U8284.22A.1005E6V-V3-C3-T1-L8100000000000000 Virtual SCSI Disk Drive
    * ent0 U8284.22A.1005E6V-V3-C2-T1 Virtual I/O Ethernet Adapter (l-lan)
    * vsa0 U8284.22A.1005E6V-V3-C0 LPAR Virtual Serial Adapter
    * vty0 U8284.22A.1005E6V-V3-C0-L0 Asynchronous Terminal
    + L2cache0 L2 Cache
    + mem0 Memory
    + proc0 Processor
    + proc8 Processor
    + proc16 Processor
    + proc24 Processor
    + proc32 Processor
    + proc40 Processor
    + proc48 Processor
    + proc56 Processor

    # vmstat

    System configuration: lcpu=32 mem=8192MB ent=2.00

    kthr memory page faults cpu
    ----- ----------- ------------------------ ------------ -----------------------
    r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec
    2 1 837850 93774 0 0 0 9 19 0 58 1269 508 30 0 70 0 1.01 50.5




    ------------------------------
    RESHMA KUMAR
    ------------------------------



  • 53.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Mon April 11, 2022 09:17 PM
      |   view attached
    Sorry for the delay in responding, I was out sick last week.

    I reconfigured my LPAR as indicated below. The only major differences between our configs now appear to be:
    a) processes(per user) -- my "unlimited" to your "128"
    b) oslevel -- my "7200-05-03-2135" to your "7200-00-01-1543" (AIX 7.2 TL0 EoSPS was 31 Dec 2018, and ITSS requires us to maintain software and systems at a vendor-supported level, so I am not at liberty to revert to 7200-00-01-1543)
    c) platform firmware level -- my "SV860_236" to your "TV840_028" (which I don't even see in Fix Central, but which in any event appears to be downlevel judging from the fact that the most recent level of SV840 is 177 not 028)
    d) Full Core -- my "false" to your "true"

    I can launch nagios:

    [root@hl1axmon:/] >ps -ef |grep nag
    root 6619468 18153782 0 16:59:43 pts/0 0:00 grep nag
    root 7471612 1 0 16:33:09 - 0:00 /opt/freeware/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
    nagios 18088412 18481590 0 16:59:32 - 0:00 /opt/freeware/bin/nagios -d /etc/nagios/nagios.cfg
    nagios 18481590 1 1 16:59:32 - 0:00 /opt/freeware/bin/nagios -d /etc/nagios/nagios.cfg

    but I am still not able to perform any checks:

    [root@hl1axmon:/var/log/nagios] >cat nagios.log
    [1649714372] Nagios 4.4.6 starting... (PID=18481590)
    [1649714372] Local time is Mon Apr 11 16:59:32 CDT 2022
    [1649714372] LOG VERSION: 2.0
    [1649714372] qh: Socket '/var/nagios/rw/nagios.qh' successfully initialized
    [1649714372] qh: core query handler registered
    [1649714372] qh: echo service query handler registered
    [1649714372] qh: help for the query handler registered
    [1649714372] wproc: Successfully registered manager as @wproc with query handler
    [1649714372] wproc: Registry request: name=Core Worker 16777648;pid=16777648
    [1649714372] wproc: Registry request: name=Core Worker 18022874;pid=18022874
    [1649714372] wproc: Registry request: name=Core Worker 6816066;pid=6816066
    [1649714372] wproc: Registry request: name=Core Worker 6619466;pid=6619466
    [1649714372] wproc: Registry request: name=Core Worker 18546972;pid=18546972
    [1649714372] wproc: Registry request: name=Core Worker 17760586;pid=17760586
    [1649714372] Successfully launched command file worker with pid 18088412
    [1649714374] wproc: Socket to worker Core Worker 16777648 broken, removing
    [1649714374] wproc: Socket to worker Core Worker 18022874 broken, removing
    [1649714374] wproc: Socket to worker Core Worker 6816066 broken, removing
    [1649714374] wproc: Socket to worker Core Worker 6619466 broken, removing
    [1649714374] wproc: Socket to worker Core Worker 18546972 broken, removing
    [1649714374] wproc: Socket to worker Core Worker 17760586 broken, removing
    [1649714374] Unable to send check for host ...

    I am attaching my latest logs for debug.

    [root@hl1axmon:/] >ulimit -a
    time(seconds)        unlimited
    file(blocks)         2097151
    data(kbytes)         131072
    stack(kbytes)        32768
    memory(kbytes)       32768
    coredump(blocks)     2097151
    nofiles(descriptors) 20000
    threads(per process) unlimited
    processes(per user)  unlimited
    
    [root@hl1axmon:/] >oslevel -s
    7200-05-03-2135
    
    [root@hl1axmon:/] >prtconf
    System Model: IBM,8286-42A
    Machine Serial Number: 219BDDV
    Processor Type: PowerPC_POWER8
    Processor Implementation Mode: POWER 8
    Processor Version: PV_8_Compat
    Number Of Processors: 8
    Processor Clock Speed: 3525 MHz
    CPU Type: 64-bit
    Kernel Type: 64-bit
    LPAR Info: 13 hl1axmon
    Memory Size: 8192 MB
    Good Memory Size: 8192 MB
    Platform Firmware level: SV860_236
    Firmware Version: IBM,FW860.A2 (SV860_236)
    Console Login: enable
    Auto Restart: true
    Full Core: false
    NX Crypto Acceleration: Capable and Enabled
    In-Core Crypto Acceleration: Capable, but not Enabled
     
    Network Information
            Host Name: hl1axmon
            IP Address: 9.35.40.27
            Sub Netmask: 255.255.255.0
            Gateway: 9.35.40.20
            Name Server: 9.35.40.128
            Domain Name: houston.ibm.com
     
    Paging Space Information
            Total Paging Space: 1024MB
            Percent Used: 1%
     
    Volume Groups Information
    ============================================================================== 
    Active VGs
    ============================================================================== 
    rootvg:
    PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
    hdisk0            active            639         433         127..30..20..128..128
    ============================================================================== 
     
    INSTALLED RESOURCE LIST
    
    The following resources are installed on the machine.
    +/- = Added or deleted from Resource List.
    *   = Diagnostic support not available.
            
      Model Architecture: chrp
      Model Implementation: Multiple Processor, PCI bus
            
    + sys0                                                             System Object
    + sysplanar0                                                       System Planar
    * vio0                                                             Virtual I/O Bus
    * vscsi1           U8286.42A.219BDDV-V13-C43-T1                    Virtual SCSI Client Adapter
    * vscsi0           U8286.42A.219BDDV-V13-C23-T1                    Virtual SCSI Client Adapter
    * hdisk1           U8286.42A.219BDDV-V13-C23-T1-L8200000000000000  Virtual SCSI Disk Drive
    * hdisk0           U8286.42A.219BDDV-V13-C23-T1-L8100000000000000  Virtual SCSI Disk Drive
    * ent0             U8286.42A.219BDDV-V13-C2-T1                     Virtual I/O Ethernet Adapter (l-lan)
    * vsa0             U8286.42A.219BDDV-V13-C0                        LPAR Virtual Serial Adapter
    * vty0             U8286.42A.219BDDV-V13-C0-L0                     Asynchronous Terminal
    + L2cache0                                                         L2 Cache
    + mem0                                                             Memory
    + proc0                                                            Processor
    + proc8                                                            Processor
    + proc16                                                           Processor
    + proc24                                                           Processor
    + proc32                                                           Processor
    + proc40                                                           Processor
    + proc48                                                           Processor
    + proc56                                                           Processor
    [root@hl1axmon:/] >vmstat 
    
    System configuration: lcpu=32 mem=8192MB ent=2.00
    
    kthr    memory              page              faults              cpu          
    ----- ----------- ------------------------ ------------ -----------------------
     r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa    pc    ec
     2  1 461486 1565149   0   0   0   0    0   0  75 19534 438  0  0 99  0  0.00   0.0
    ​


    ------------------------------
    Erich Wolz
    ------------------------------

    Attachment(s)

    tar
    nagios_debug_20220411.tar   1.71 MB 1 version


  • 54.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Mon April 11, 2022 09:24 PM

    Here is my lsdev output:

    L2cache0   Available       L2 Cache
    cache0     Defined         SSD Cache virtual device
    cd0        Defined         Virtual SCSI Optical Served by VIO Server
    cengine0   Defined         SSD Cache engine
    cluster0   Available       Cluster Node
    en0        Available       Standard Ethernet Network Interface
    ent0       Available       Virtual I/O Ethernet Adapter (l-lan)
    et0        Defined         IEEE 802.3 Ethernet Network Interface
    fslv00     Defined         Logical volume
    fslv01     Defined         Logical volume
    fslv02     Defined         Logical volume
    hd1        Defined         Logical volume
    hd2        Defined         Logical volume
    hd3        Defined         Logical volume
    hd4        Defined         Logical volume
    hd5        Defined         Logical volume
    hd6        Defined         Logical volume
    hd8        Defined         Logical volume
    hd10opt    Defined         Logical volume
    hd11admin  Defined         Logical volume
    hd9var     Defined         Logical volume
    hdisk0     Available       Virtual SCSI Disk Drive
    hdisk1     Available       Virtual SCSI Disk Drive
    inet0      Available       Internet Network Extension
    iocp0      Defined         I/O Completion Ports
    iscsi0     Available       iSCSI Protocol Device
    lg_dumplv  Defined         Logical volume
    livedump   Defined         Logical volume
    lo0        Available       Loopback Network Interface
    lvdd       Available       LVM Device Driver
    mem0       Available       Memory
    pkcs11     Available       ACF/PKCS#11 Device
    proc0      Available 00-00 Processor
    proc8      Available 00-08 Processor
    proc16     Available 00-16 Processor
    proc24     Available 00-24 Processor
    proc32     Available 00-32 Processor
    proc40     Available 00-40 Processor
    proc48     Available 00-48 Processor
    proc56     Available 00-56 Processor
    pty0       Available       Asynchronous Pseudo-Terminal
    rcm0       Defined         Rendering Context Manager Subsystem
    rootvg     Defined         Volume group
    sfw0       Available       Storage Framework Module
    sys0       Available       System Object
    sysplanar0 Available       System Planar
    vio0       Available       Virtual I/O Bus
    vsa0       Available       LPAR Virtual Serial Adapter
    vscsi0     Available       Virtual SCSI Client Adapter
    vscsi1     Available       Virtual SCSI Client Adapter
    vty0       Available       Asynchronous Terminal
    


    ------------------------------
    Erich Wolz
    ------------------------------



  • 55.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Wed April 13, 2022 03:59 PM
    Since you are able to launch nagios in your environment, can you please provide your nagios.cfg file?

    ------------------------------
    Erich Wolz
    ------------------------------



  • 56.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue April 19, 2022 07:10 AM
      |   view attached
    Attaching nagios.cfg file

    ------------------------------
    RESHMA KUMAR
    ------------------------------

    Attachment(s)

    cfg
    nagios.cfg   44 KB 1 version


  • 57.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue April 19, 2022 12:06 PM
      |   view attached

    Thank you for that.  I have commented out the cfg_dir statement from my own nagios.cfg, so now the only difference between mine and yours is the debug_level:

    $ diff nagios.cfg nagios.cfg_toolbox
    55d54
    < #cfg_dir=/etc/nagios/projects
    58a58
    > 
    1278c1278
    < debug_level=-1
    ---
    > debug_level=0
    

    Unfortunately, even this does not work (I am seeing multiple "failed because job was null" messages in the nagios.debug log).  The corresponding nagios.log and nagios.debug files are attached. 



    ------------------------------
    Erich Wolz
    ------------------------------

    Attachment(s)

    tar
    nagios_debug_20220419.tar   100 KB 1 version


  • 58.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Mon April 25, 2022 03:26 PM
    Ping

    ------------------------------
    Erich Wolz
    ------------------------------



  • 59.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Wed May 04, 2022 05:31 PM
    Ping

    ------------------------------
    Erich Wolz
    ------------------------------



  • 60.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu May 05, 2022 05:19 PM

    I see from https://www.nagios.org/projects/nagios-core/history/4x/ that Nagios Core 4.4.7 has been available since 2022-04-14, and includes the following fixes: 

    • Fixed checkboxes in jsonquery.html (#778) (Rfferrao87)
    • Added SSL support for version update check (Sebastian Wolf)
    • Note: NEB modules using the priority/scheduling queues in libnagios may need to update headers due to symbol conflicts with OpenSSL.
    • Fixed XSS in homepage when displaying update check results (Sebastian Wolf)
    • Fixed allocation error in getcgi.c (#820) (Ariadne Conill)
    • Fixed Error: NULL variable for lines of spaces in resource.cfg (#814) (Ralf Herrmann)
    • Fixed crash when handling large check output (#825, #828) (Kilvador)
    • Update packaging instructions for RPM/EPEL (#850) (T.J. Yang)
    • Include packaging instructions for DEB (#842) (Catfriend1)
    • Fixed CGI object processing when names end in \ (#819) (Sebastian Wolf)
    • $SERVICEPROBLEMID$ now accessible when notifications are sent (#688) (Sebastian Wolf)

    I can't tell if any of these are related to the issues that have been documented in this thread... but in any event, 4.4.6 is over two years old and probably due for a refresh anyway.



    ------------------------------
    Erich Wolz
    ------------------------------



  • 61.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Mon May 16, 2022 04:40 PM
    I atached my most recent nagios.log and nagios.debug files to this thread four weeks ago, but have yet to receive an acknowledgement that they were even looked at.  At this point there are few (if any) substantive differences between our Nagios configs, yet I am still seeing multiple "failed because job was null" messages in the nagios.debug log -- I suspect this may be because the Core Worker process end up broken almost right after they are launched and before any checks have a chance to run.

    ------------------------------
    Erich Wolz
    ------------------------------



  • 62.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue May 17, 2022 01:08 AM
      |   view attached

    Thank you for sharing the information and sorry for the delay.

    Before setting ulimit -n 20000, we got this error "failed because job was null". However after changing the ulimit values, we do not see this issue.

    Attaching the debug log.



    ------------------------------
    RESHMA KUMAR
    ------------------------------

    Attachment(s)

    txt
    nagios.debug.txt   163 KB 1 version


  • 63.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue May 17, 2022 04:11 PM

    Hmm, my ulimit value for the number of file descriptors is already 20000:

    [root@hl1axmon:/] >ulimit -a
    time(seconds) unlimited
    file(blocks) 2097151
    data(kbytes) 131072
    stack(kbytes) 32768
    memory(kbytes) 32768
    coredump(blocks) 2097151
    nofiles(descriptors) 20000
    threads(per process) unlimited
    processes(per user) unlimited

    Is there a "rule of thumb" for what nofiles should be set to, based on the number of hosts and services being checked?



    ------------------------------
    Erich Wolz
    ------------------------------



  • 64.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Fri May 27, 2022 06:39 PM
    Ping

    ------------------------------
    Erich Wolz
    ------------------------------



  • 65.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu June 02, 2022 10:58 AM
    As we are not able to reproduce this issue from our side, and debug options are also not giving any useful info, I would suggest you to open a issue with Nagios community. https://github.com/NagiosEnterprises/nagioscore/issues
    They might be able to provide some clues about the issue. Once the issue is opened, you can put the issue link here so that we can also look into it and assist with any details.

    ------------------------------
    Ayappan P
    ------------------------------



  • 66.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu June 02, 2022 11:48 AM

    I would be happy to open an issue with the Nagios community, but as the issue currently exists on Nagios Core 4.4.6 and Nagios Core 4.4.7 has been available since 2022-04-14, the first thing I would likely be asked to do is to step up to Nagios Core 4.4.7 to see if the issue even still exists. 

    Please update the AIX Open Source version of Nagios to 4.4.7, and if the issue does still exist (hopefully not!) I will open an issue with the Nagios community.  



    ------------------------------
    Erich Wolz
    ------------------------------



  • 67.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Fri June 03, 2022 03:12 AM
    Okay. We will update Nagios to 4.4.7 in Toolbox.

    ------------------------------
    Ayappan P
    ------------------------------



  • 68.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue July 26, 2022 10:38 AM
    What is the status of this?  I still see only 4.4.6 when I do a yum list:
    # yum list | grep nagios
    nagios.ppc                                   4.4.6-2          @AIX_Toolbox      
    nagios-gui.ppc                               4.4.6-2          @AIX_Toolbox      
    nagios-nrpe.ppc                              4.0.3-1          @AIX_Toolbox      
    nagios-plugins.ppc                           2.3.3-1          @AIX_Toolbox      
    nagios-devel.ppc                             4.4.6-2          AIX_Toolbox       ​


    ------------------------------
    Erich Wolz
    ------------------------------



  • 69.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue July 26, 2022 12:23 PM
    Hi Erich, We have included update of nagios in plan and we will work on this in this quarter. Planning to deliver by end of this quarter.

    ------------------------------
    SANKET RATHI
    ------------------------------



  • 70.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu September 08, 2022 10:32 PM
      |   view attached

    I just updated to nagios-4.4.7-1 and nagios-gui-4.4.7-1, but I am still seeing a bunch of "wproc: Registry request: name=Core Worker" statements in nagios.log followed almost immediately by an equal number of "wproc: Socket to worker Core Worker broken, removing" statements.

    There are also a bunch of "Unable to run scheduled host check at this time" and "failed because job was null" messages in the nagios.debug.  

    As I am not familiar with the inner workings of the nagios code (though given all the "job was null" messages I suspect a memory management error in how the jobs are constructed), I am attaching my most recent logs.



    ------------------------------
    Erich Wolz
    ------------------------------

    Attachment(s)

    tar
    nagios_debug_20220908.tar   1.46 MB 1 version


  • 71.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Fri September 09, 2022 07:28 PM

    On a whim, I tried starting Nagios *not* as a daemon, but in the current command window, and got some additional output that may provide a clue as to what has been going on this entire time:

    # /opt/freeware/bin/nagios /etc/nagios/nagios.cfg

    Nagios Core 4.4.7
    Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
    Copyright (c) 1999-2009 Ethan Galstad
    Last Modified: 2022-04-14
    License: GPL

    Website: https://www.nagios.org
    Nagios 4.4.7 starting... (PID=10551764)
    Local time is Fri Sep 09 18:16:46 CDT 2022
    wproc: Successfully registered manager as @wproc with query handler
    wproc: Registry request: name=Core Worker 12059020;pid=12059020
    wproc: Registry request: name=Core Worker 10027334;pid=10027334
    wproc: Registry request: name=Core Worker 10420642;pid=10420642
    wproc: Registry request: name=Core Worker 10355020;pid=10355020
    wproc: Registry request: name=Core Worker 11010392;pid=11010392
    wproc: Registry request: name=Core Worker 9830736;pid=9830736
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    wproc: Registry request: name=Core Worker 10813772;pid=10813772
    wproc: Registry request: name=Core Worker 10486022;pid=10486022
    wproc: Registry request: name=Core Worker 6684940;pid=6684940
    wproc: Registry request: name=Core Worker 10748288;pid=10748288
    wproc: Registry request: name=Core Worker 9437456;pid=9437456
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    wproc: Registry request: name=Core Worker 10092872;pid=10092872
    wproc: Registry request: name=Core Worker 10289586;pid=10289586
    wproc: Registry request: name=Core Worker 10617166;pid=10617166
    wproc: Registry request: name=Core Worker 8913244;pid=8913244
    wproc: Registry request: name=Core Worker 8388956;pid=8388956
    wproc: Registry request: name=Core Worker 10158402;pid=10158402
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    wproc: Registry request: name=Core Worker 10223938;pid=10223938
    wproc: Registry request: name=Core Worker 9961788;pid=9961788
    Worker failed to create io broker socket set: No such file or directory
    wproc: Registry request: name=Core Worker 10682712;pid=10682712
    wproc: Registry request: name=Core Worker 9634096;pid=9634096
    wproc: Registry request: name=Core Worker 9699634;pid=9699634
    wproc: Registry request: name=Core Worker 8651052;pid=8651052
    wproc: Registry request: name=Core Worker 9568558;pid=9568558
    wproc: Registry request: name=Core Worker 8978724;pid=8978724
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    wproc: Registry request: name=Core Worker 9109796;pid=9109796
    wproc: Registry request: name=Core Worker 9044306;pid=9044306
    wproc: Registry request: name=Core Worker 8716606;pid=8716606
    wproc: Registry request: name=Core Worker 12190076;pid=12190076
    wproc: Registry request: name=Core Worker 12124542;pid=12124542
    wproc: Registry request: name=Core Worker 9896316;pid=9896316
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    wproc: Registry request: name=Core Worker 12255624;pid=12255624
    wproc: Registry request: name=Core Worker 11600286;pid=11600286
    wproc: Registry request: name=Core Worker 11862386;pid=11862386
    wproc: Registry request: name=Core Worker 11141592;pid=11141592
    wproc: Registry request: name=Core Worker 11796850;pid=11796850
    wproc: Registry request: name=Core Worker 11731318;pid=11731318
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    wproc: Registry request: name=Core Worker 11927924;pid=11927924
    wproc: Registry request: name=Core Worker 10945006;pid=10945006
    wproc: Registry request: name=Core Worker 11665770;pid=11665770
    wproc: Registry request: name=Core Worker 11272524;pid=11272524
    wproc: Registry request: name=Core Worker 11993356;pid=11993356
    wproc: Registry request: name=Core Worker 9175426;pid=9175426
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    wproc: Registry request: name=Core Worker 11207016;pid=11207016
    wproc: Registry request: name=Core Worker 11403586;pid=11403586
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    wproc: Registry request: name=Core Worker 11534714;pid=11534714
    wproc: Registry request: name=Core Worker 11338154;pid=11338154
    Worker failed to create io broker socket set: No such file or directory
    wproc: Registry request: name=Core Worker 8847852;pid=8847852
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Successfully launched command file worker with pid 11469234
    wproc: Socket to worker Core Worker 12059020 broken, removing
    wproc: 'Core Worker 10355020' seems to be choked. ret = -1; bufsize = 132: written = 0; errno = 32 (Broken pipe)
    wproc: Socket to worker Core Worker 10027334 broken, removing
    wproc: 'Core Worker 9830736' seems to be choked. ret = -1; bufsize = 132: written = 0; errno = 32 (Broken pipe)
    wproc: Socket to worker Core Worker 10420642 broken, removing
    wproc: Socket to worker Core Worker 10355020 broken, removing
    wproc: Socket to worker Core Worker 11010392 broken, removing
    wproc: Socket to worker Core Worker 9830736 broken, removing
    wproc: Socket to worker Core Worker 10813772 broken, removing
    wproc: Socket to worker Core Worker 10486022 broken, removing
    wproc: Socket to worker Core Worker 6684940 broken, removing
    wproc: Socket to worker Core Worker 10748288 broken, removing
    wproc: Socket to worker Core Worker 9437456 broken, removing
    wproc: Socket to worker Core Worker 10092872 broken, removing
    wproc: Socket to worker Core Worker 10289586 broken, removing
    wproc: Socket to worker Core Worker 10617166 broken, removing
    wproc: Socket to worker Core Worker 8913244 broken, removing
    wproc: Socket to worker Core Worker 8388956 broken, removing
    wproc: Socket to worker Core Worker 10158402 broken, removing
    wproc: Socket to worker Core Worker 10223938 broken, removing
    wproc: Socket to worker Core Worker 9961788 broken, removing
    wproc: Socket to worker Core Worker 10682712 broken, removing
    wproc: Socket to worker Core Worker 9634096 broken, removing
    wproc: Socket to worker Core Worker 9699634 broken, removing
    wproc: Socket to worker Core Worker 8651052 broken, removing
    wproc: Socket to worker Core Worker 9568558 broken, removing
    wproc: Socket to worker Core Worker 8978724 broken, removing
    wproc: Socket to worker Core Worker 9109796 broken, removing
    wproc: Socket to worker Core Worker 9044306 broken, removing
    wproc: Socket to worker Core Worker 8716606 broken, removing
    wproc: Socket to worker Core Worker 12190076 broken, removing
    wproc: Socket to worker Core Worker 12124542 broken, removing
    wproc: Socket to worker Core Worker 9896316 broken, removing
    wproc: Socket to worker Core Worker 12255624 broken, removing
    wproc: Socket to worker Core Worker 11600286 broken, removing
    wproc: Socket to worker Core Worker 11862386 broken, removing
    wproc: Socket to worker Core Worker 11141592 broken, removing
    wproc: Socket to worker Core Worker 11796850 broken, removing
    wproc: Socket to worker Core Worker 11731318 broken, removing
    wproc: Socket to worker Core Worker 11927924 broken, removing
    wproc: Socket to worker Core Worker 10945006 broken, removing
    wproc: Socket to worker Core Worker 11665770 broken, removing
    wproc: Socket to worker Core Worker 11272524 broken, removing
    wproc: Socket to worker Core Worker 11993356 broken, removing
    wproc: Socket to worker Core Worker 9175426 broken, removing
    wproc: Socket to worker Core Worker 11207016 broken, removing
    wproc: Socket to worker Core Worker 11403586 broken, removing
    wproc: Socket to worker Core Worker 11534714 broken, removing
    wproc: Socket to worker Core Worker 11338154 broken, removing
    wproc: Socket to worker Core Worker 8847852 broken, removing

    i.e. it would probably be helpful to know what file or directory is needed in order to create an io broker socket.



    ------------------------------
    Erich Wolz
    ------------------------------



  • 72.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu September 15, 2022 10:10 AM
    As discussed earlier, you can now open an issue against nagios community with the details. 
    https://github.com/NagiosEnterprises/nagioscore/issues

    ------------------------------
    Ayappan P
    ------------------------------



  • 73.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu September 15, 2022 10:44 AM
    Right.  I had intended to start out by saying something along the lines of "I'm using the Nagios 4.4.7 downloaded from the AIX Open Source Toolbox..." and provide a link, but the link I was going to provide (https://www.ibm.com/support/pages/aix-toolbox-open-source-software-downloads-alpha#N) only mentions 4.4.6.  Is that not the right link?  

    ------------------------------
    Erich Wolz
    ------------------------------



  • 74.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu September 15, 2022 01:34 PM
    We have updated the webpage now, please check it.

    ------------------------------
    RESHMA KUMAR
    ------------------------------



  • 75.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu September 15, 2022 06:05 PM
    Issue 882 opened: https://github.com/NagiosEnterprises/nagioscore/issues/882

    ------------------------------
    Erich Wolz
    ------------------------------



  • 76.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Wed September 21, 2022 10:58 AM
    Well, I opened the issue on github 6 days ago (as of this writing) and there have been 0 comments.  Looks like we're on our own with this :-(

    ------------------------------
    Erich Wolz
    ------------------------------



  • 77.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue September 27, 2022 04:34 PM

    12 days and still no comments :-(

    As noted back on 5/17, I already had "nofiles=20000". Thinking that perhaps the above "Worker failed to create io broker socket set" messages might be due to a file descriptor limit, I increased nofiles as follows:

    # ulimit -a
    time(seconds) unlimited
    file(blocks) 2097151
    data(kbytes) 131072
    stack(kbytes) 32768
    memory(kbytes) 32768
    coredump(blocks) 2097151
    nofiles(descriptors) unlimited
    threads(per process) unlimited
    processes(per user) unlimited

    #/opt/freeware/bin/nagios /etc/nagios/nagios.cfg

    Nagios Core 4.4.7
    Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
    Copyright (c) 1999-2009 Ethan Galstad
    Last Modified: 2022-04-14
    License: GPL

    Website: https://www.nagios.org
    Error: Failed to create IO broker set: Error 0

    This is the same error reported at https://community.ibm.com/community/user/power/communities/community-home/digestviewer/viewthread?MessageKey=79310f7f-3d7f-4d0f-9ddf-18c236451b0a&CommunityKey=10c1d831-47ee-4d92-a138-b03f7896f7c9&tab=digestviewer
    (and which also does not appear to have been resolved)



    ------------------------------
    Erich Wolz
    ------------------------------



  • 78.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu September 29, 2022 03:07 PM

    Finally, a suggestion from someone on the github thread:

    Unfortunately, we don't have any AIX machines available in-house at the moment, so I'm not able to reproduce/debug this (and I haven't seen this issue in any other contexts). If I were you, I'd try reaching out to an AIX-specific support forum or users' group to see if anyone else has run into this.



    ------------------------------
    Erich Wolz
    ------------------------------



  • 79.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Wed October 19, 2022 02:24 PM
    So, is this it, then?  Neither the github folks nor the AIX-specific support forum folks can reproduce/debug this, so the rest of us are out of luck?

    ------------------------------
    Erich Wolz
    ------------------------------



  • 80.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Fri October 21, 2022 06:19 PM
    The only other suggestion from the github folks was, "every time we've installed Nagios Core on AIX, we've compiled it from source. If that's an option for you, I'd go that route rather than rpms."

    While that certainly defeats the purpose of installing from RPMs, at this point I'm willing to give it a try.  What options did you specify for the "configure" command, prior to running "make all"?

    ------------------------------
    Erich Wolz
    ------------------------------



  • 81.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    IBM Champion
    Posted Fri October 21, 2022 11:44 PM
    You can grab the SRPM from the Toolbox site and look at the included SPEC (or even rebuild it on your own box), then you'd be able to share the differences if you got it to work. :)

    The entire build process is specified in the SRPM, including contain all of the source files, patches, no scripts.

    -- 
    Stephen L. Ulmer
    Enterprise Architect
    Mainline Information Systems
    (m) 352-870-8649






  • 82.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    IBM Champion
    Posted Fri October 21, 2022 11:48 PM
    Sigh, "and" scripts. It has everything in there...

    -- 
    Stephen L. Ulmer
    Enterprise Architect
    Mainline Information Systems
    (m) 352-870-8649






  • 83.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Fri October 28, 2022 01:16 PM
      |   view attached
    Ok, I downloaded the SRPM from the Toolbox site... apparently it's a superset of whatever is available from the Nagios site itself, because when I tried to install it, it warned me that the userid "buildusr" does not exist:

    [root@dl1axmon.houston.ibm.com:/] >rpm -Uvh /tmp/nagios-4.4.7-1.src.rpm
    Updating / installing...
    1:nagios-4.4.7-1
    warning: user buildusr does not exist - using root
    warning: user buildusr does not exist - using root
    warning: user buildusr does not exist - using root
    warning: user buildusr does not exist - using root
    warning: user buildusr does not exist - using root
    warning: user buildusr does not exist - using root
    ################################# [100%]
    warning: user buildusr does not exist - using root
    [root@dl1axmon.houston.ibm.com:/] >

    I say "tried to install it" because I can't tell that anything was installed (i.e. I was expecting to see that the nagios-4.4.7-1 src package was installed), so I don't even know where to look for the spec, source, scripts, etc:

    [root@dl1axmon.houston.ibm.com:/] >rpm -qa | grep nagios
    [root@dl1axmon.houston.ibm.com:/] >

    So, I downloaded the nagios-4.4.8 tarball from the Nagios site, and tried to build it (like I did for nagios-3.1.2 way back in the day, which of course I was trying to avoid having to do by using the Toolbox rpm's in the first place). Running "./configure" seemed to go uneventfully (the config.log is attached) so I ran "make all" with the following results -- including a lot of warnings I don't know how to fix:

    [root@dl1axmon.houston.ibm.com:/usr/local/nagios-4.4.8] >make all
    cd ./base && make
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c -o nagios.o nagios.c
    nagios.c: In function 'test_path_access':
    nagios.c:122:3: warning: implicit declaration of function 'asprintf'; did you mean 'wsprintf'? [-Wimplicit-function-declaration]
    asprintf(&path, "%s/%s", p, program);
    ^~~~~~~~
    wsprintf
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c broker.c
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c nebmods.c
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c -o ../common/shared.o ../common/shared.c
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c query-handler.c
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c -o workers.o workers.c
    workers.c: In function 'wproc_can_spawn':
    workers.c:173:7: warning: implicit declaration of function 'getloadavg'; did you mean 'ue_load'? [-Wimplicit-function-declaration]
    if (getloadavg(lc->load, 3) < 0) {
    ^~~~~~~~~~
    ue_load
    workers.c: In function 'handle_worker_check':
    workers.c:618:3: warning: implicit declaration of function 'asprintf'; did you mean 'wsprintf'? [-Wimplicit-function-declaration]
    asprintf(&cr->output, "(No output on stdout) stderr: %s", wpres->outerr);
    ^~~~~~~~
    wsprintf
    workers.c: In function 'handle_worker_result':
    workers.c:806:13: warning: implicit declaration of function 'WCOREDUMP' [-Wimplicit-function-declaration]
    WCOREDUMP(wpres.wait_status) ? " (core dumped)" : "",
    ^~~~~~~~~
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c checks.c
    checks.c: In function 'get_service_check_return_code':
    checks.c:376:3: warning: implicit declaration of function 'asprintf'; did you mean 'wsprintf'? [-Wimplicit-function-declaration]
    asprintf(&svc->plugin_output, "(Service check timed out after %.2lf seconds)", svc->execution_time);
    ^~~~~~~~
    wsprintf
    checks.c: In function 'check_for_orphaned_services':
    checks.c:2027:154: warning: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'time_t' {aka 'int'} [-Wformat=]
    ning: The check of service '%s' on host '%s' looks like it was orphaned (results never came back; last_check=%lu; next_check=%lu). I'm scheduling an immediate check of the service...\n", temp_service->description, temp_service->host_name, temp_service->last_check, temp_service->next_check);
    ~~^ ~~~~~~~~~~~~~~~~~~~~~~~~
    %u
    checks.c:2027:170: warning: format '%lu' expects argument of type 'long unsigned int', but argument 7 has type 'time_t' {aka 'int'} [-Wformat=]
    of service '%s' on host '%s' looks like it was orphaned (results never came back; last_check=%lu; next_check=%lu). I'm scheduling an immediate check of the service...\n", temp_service->description, temp_service->host_name, temp_service->last_check, temp_service->next_check);
    ~~^ ~~~~~~~~~~~~~~~~~~~~~~~~
    %u
    checks.c:2030:53: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'time_t' {aka 'int'} [-Wformat=]
    log_debug_info(DEBUGL_CHECKS, 1, " next_check=%lu (%s); last_check=%lu (%s);\n",
    ~~^
    %u
    temp_service->next_check, ctime(&temp_service->next_check),
    ~~~~~~~~~~~~~~~~~~~~~~~~
    checks.c:2030:74: warning: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'time_t' {aka 'int'} [-Wformat=]
    log_debug_info(DEBUGL_CHECKS, 1, " next_check=%lu (%s); last_check=%lu (%s);\n",
    ~~^
    %u
    checks.c:2032:10:
    temp_service->last_check, ctime(&temp_service->last_check));
    ~~~~~~~~~~~~~~~~~~~~~~~~
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c config.c
    config.c: In function 'obsoleted_warning':
    config.c:81:2: warning: implicit declaration of function 'asprintf'; did you mean 'wsprintf'? [-Wimplicit-function-declaration]
    asprintf(&buf, "Warning: %s is deprecated and will be removed.%s%s\n",
    ^~~~~~~~
    wsprintf
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c commands.c
    commands.c: In function 'process_external_command1':
    commands.c:904:2: warning: implicit declaration of function 'asprintf'; did you mean 'wsprintf'? [-Wimplicit-function-declaration]
    asprintf(&temp_buffer, "EXTERNAL COMMAND: %s;%s\n", command_id, args);
    ^~~~~~~~
    wsprintf
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c events.c
    events.c: In function 'init_timing_loop':
    events.c:356:30: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'int' [-Wformat=]
    " Fixing check time %lu secs too far away\n",
    ~~^
    %u
    events.c:508:66: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'int' [-Wformat=]
    log_debug_info(DEBUGL_EVENTS, 1, "Fixing check time (off by %lu)\n",
    ~~^
    %u
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c flapping.c
    flapping.c: In function 'set_service_flap':
    flapping.c:313:3: warning: implicit declaration of function 'asprintf'; did you mean 'wsprintf'? [-Wimplicit-function-declaration]
    asprintf(&temp_buffer, "Notifications for this service are being suppressed because it was detected as having been flapping between different states (%2.1f%% change >= %2.1f%% threshold). When the service state stabilizes and the flapping stops, notifications will be re-enabled.", percent_change, high_threshold);
    ^~~~~~~~
    wsprintf
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c logging.c
    logging.c: In function 'logit':
    logging.c:82:5: warning: implicit declaration of function 'vasprintf'; did you mean 'vwsprintf'? [-Wimplicit-function-declaration]
    if(vasprintf(&buffer, fmt, ap) > 0) {
    ^~~~~~~~~
    vwsprintf
    logging.c: In function 'log_service_event':
    logging.c:270:2: warning: implicit declaration of function 'asprintf'; did you mean 'wsprintf'? [-Wimplicit-function-declaration]
    asprintf(&temp_buffer, "SERVICE ALERT: %s;%s;%s;%s;%d;%s\n",
    ^~~~~~~~
    wsprintf
    logging.c: In function 'log_debug_info':
    logging.c:536:29: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'time_t' {aka 'int'} [-Wformat=]
    fprintf(debug_file_fp, "[%lu.%06lu] [%03d.%d] [pid=%lu] ", current_time.tv_sec, current_time.tv_usec, level, verbosity, (unsigned long)getpid());
    ~~^ ~~~~~~~~~~~~~~~~~~~
    %u
    logging.c:536:35: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'suseconds_t' {aka 'int'} [-Wformat=]
    fprintf(debug_file_fp, "[%lu.%06lu] [%03d.%d] [pid=%lu] ", current_time.tv_sec, current_time.tv_usec, level, verbosity, (unsigned long)getpid());
    ~~~~^ ~~~~~~~~~~~~~~~~~~~~
    %06u
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c -o macros-base.o ../common/macros.c
    ../common/macros.c: In function 'grab_macrox_value_r':
    ../common/macros.c:1267:5: warning: implicit declaration of function 'asprintf'; did you mean 'wsprintf'? [-Wimplicit-function-declaration]
    asprintf(&mac->x[MACRO_TOTALHOSTSUP], "%d", hosts_up);
    ^~~~~~~~
    wsprintf
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c netutils.c
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c notifications.c
    notifications.c: In function 'service_notification':
    notifications.c:223:3: warning: implicit declaration of function 'asprintf'; did you mean 'wsprintf'? [-Wimplicit-function-declaration]
    asprintf(&mac.x[MACRO_SERVICENOTIFICATIONNUMBER], "%d", svc->current_notification_number);
    ^~~~~~~~
    wsprintf
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c sehandlers.c
    sehandlers.c: In function 'run_global_service_event_handler':
    sehandlers.c:263:3: warning: implicit declaration of function 'asprintf'; did you mean 'wsprintf'? [-Wimplicit-function-declaration]
    asprintf(&raw_logentry, "GLOBAL SERVICE EVENT HANDLER: %s;%s;$SERVICESTATE$;$SERVICESTATETYPE$;$SERVICEATTEMPT$;%s\n", svc->host_name, svc->description, global_service_event_handler);
    ^~~~~~~~
    wsprintf
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c utils.c
    utils.c: In function 'process_check_result_queue':
    utils.c:2247:4: warning: implicit declaration of function 'asprintf'; did you mean 'wsprintf'? [-Wimplicit-function-declaration]
    asprintf(&temp_buffer, "%s.ok", file);
    ^~~~~~~~
    wsprintf
    utils.c: In function 'process_check_result_file':
    utils.c:2419:78: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'time_t' {aka 'int'} [-Wformat=]
    "Skipping check_result because file_time is %s and max cr file age is %lu",
    ~~^
    %u
    val, max_check_result_file_age);
    ~~~~~~~~~~~~~~~~~~~~~~~~~
    utils.c: In function 'process_check_result_queue':
    utils.c:2212:38: warning: 'snprintf' output may be truncated before the last format character [-Wformat-truncation=]
    snprintf(file, sizeof(file), "%s/%s", dirname, dirfile->d_name);
    ^
    utils.c:2212:3: note: 'snprintf' output 2 or more bytes (assuming 257) into a destination of size 256
    snprintf(file, sizeof(file), "%s/%s", dirname, dirfile->d_name);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c -o retention-base.o sretention.c
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c -o xretention-base.o ../xdata/xrddefault.c
    ../xdata/xrddefault.c: In function 'xrddefault_save_state_information':
    ../xdata/xrddefault.c:112:2: warning: implicit declaration of function 'asprintf'; did you mean 'wsprintf'? [-Wimplicit-function-declaration]
    asprintf(&tmp_file, "%sXXXXXX", temp_file);
    ^~~~~~~~
    wsprintf
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c -o comments-base.o ../common/comments.c
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c -o xcomments-base.o ../xdata/xcddefault.c
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c -o objects-base.o ../common/objects.c
    ../common/objects.c: In function 'timerange2str.part.2':
    ../common/objects.c:2999:26: warning: '%02d' directive writing between 2 and 7 bytes into a region of size between 1 and 6 [-Wformat-overflow=]
    sprintf(str, "%02d:%02d-%02d:%02d", sh, sm, eh, em);
    ^~~~
    ../common/objects.c:2999:15: note: directive argument in the range [0, 1193046]
    sprintf(str, "%02d:%02d-%02d:%02d", sh, sm, eh, em);
    ^~~~~~~~~~~~~~~~~~~~~
    ../common/objects.c:2999:15: note: directive argument in the range [0, 59]
    ../common/objects.c:2999:2: note: 'sprintf' output between 12 and 22 bytes into a destination of size 12
    sprintf(str, "%02d:%02d-%02d:%02d", sh, sm, eh, em);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c -o xobjects-base.o ../xdata/xodtemplate.c
    ../xdata/xodtemplate.c: In function 'xodtemplate_read_config_data':
    ../xdata/xodtemplate.c:303:6: warning: implicit declaration of function 'asprintf'; did you mean 'wsprintf'? [-Wimplicit-function-declaration]
    asprintf(&cfgfile, "%s/%s", config_base_dir, val);
    ^~~~~~~~
    wsprintf
    ../xdata/xodtemplate.c: In function 'xodtemplate_process_config_dir':
    ../xdata/xodtemplate.c:592:38: warning: 'snprintf' output may be truncated before the last format character [-Wformat-truncation=]
    snprintf(file, sizeof(file), "%s/%s", dirname, dirfile->d_name);
    ^
    ../xdata/xodtemplate.c:592:3: note: 'snprintf' output 2 or more bytes (assuming 257) into a destination of size 256
    snprintf(file, sizeof(file), "%s/%s", dirname, dirfile->d_name);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c -o statusdata-base.o ../common/statusdata.c
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c -o xstatusdata-base.o ../xdata/xsddefault.c
    ../xdata/xsddefault.c: In function 'xsddefault_save_status_data':
    ../xdata/xsddefault.c:134:2: warning: implicit declaration of function 'asprintf'; did you mean 'wsprintf'? [-Wimplicit-function-declaration]
    asprintf(&tmp_log, "%sXXXXXX", temp_file);
    ^~~~~~~~
    wsprintf
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c -o perfdata-base.o perfdata.c
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c -o xperfdata-base.o ../xdata/xpddefault.c
    gcc -Wall -I.. -g -O2 -I/usr/include/openssl -DHAVE_CONFIG_H -DNSCORE -c -o downtime-base.o ../common/downtime.c
    ../common/downtime.c: In function 'schedule_downtime':
    ../common/downtime.c:242:56: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'time_t' {aka 'int'} [-Wformat=]
    log_debug_info(DEBUGL_DOWNTIME, 1, "Invalid start (%lu) or end (%lu) times\n",
    ~~^
    %u
    start_time, end_time);
    ~~~~~~~~~~
    ../common/downtime.c:242:69: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'time_t' {aka 'int'} [-Wformat=]
    log_debug_info(DEBUGL_DOWNTIME, 1, "Invalid start (%lu) or end (%lu) times\n",
    ~~^
    %u
    start_time, end_time);
    ~~~~~~~~
    ../common/downtime.c: In function 'register_downtime':
    ../common/downtime.c:439:3: warning: implicit declaration of function 'asprintf'; did you mean 'wsprintf'? [-Wimplicit-function-declaration]
    asprintf(&temp_buffer, "This %s has been scheduled for fixed downtime from %s to %s. Notifications for the %s will not be sent out during that time period.", type_string, start_time_string, end_time_string, type_string);
    ^~~~~~~~
    wsprintf
    make -C ../lib
    make: Not a recognized flag: C
    usage: make [-einqrst] [-k|-S] [-d[A|adg[1|2]mstv]] [-D variable] [-f makefile] [-j [jobs]] [variable=value ...] [target ...]
    make: 1254-004 The error code from the last command is 2.


    Stop.
    make: 1254-004 The error code from the last command is 2.


    Stop.

    ------------------------------
    Erich Wolz
    ------------------------------

    Attachment(s)

    log
    config.log   79 KB 1 version


  • 84.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    IBM Champion
    Posted Sat October 29, 2022 03:08 PM
    Okay, I'm just going to be pedantic. This is a reflection of my day, not necessarily of your question, but I don't want to leave you without a reply:

    What you're asking for now is for other people to repeat work they've already done and already recorded in that SPEC file. I'm not saying that the Toolbox package isn't broken, just that starting from where they left off is better for everyone.

    TL;DR

    You know that the SRPM has all of the information to build the package, because there EXIST binary packages built from that very file on the Toolbox site. The problems you are having compiling the code are clearly solved by SOMETHING inside the SRPM (maybe a bunch of patches, maybe configure directives - I haven't looked).

    Maybe you could look around and see where the contents of the SRPM went? It went somewhere, so look in /opt/freeware or /root or /usr/src or $HOME/rpm or somewhere to find out where AIX's RPM puts those things. Then you can examine the SPEC file and look at the actual instructions that built the existing binary packages. SRPMs are not recorded in the RPM database, as they aren't really "installable" units. Installing the SRPM is akin to un-tar-ing the sources into a stylized build area. So you have everything to get ahead on your research somewhere on your system.

    Liberty,

    -- 
    Stephen L. Ulmer
    Enterprise Architect
    Mainline Information Systems








  • 85.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Mon October 31, 2022 04:13 PM
    Thanks for this info. As can no doubt be inferred from my previous posts, I am not a developer and I am (clearly) not familiar with SRPMs, SPEC files, etc. In fact this is the first .rpm I've ever installed that did *not* record something in the RPM database (so, a new-to-me behavior). It is also only the second package I've ever tried to build from source code (the first was Nagios 3.1 on AIX 6.1 however many years ago (for which, needless to say, there was no SRPM); that build took me long enough to slog through that I was looking forward to being able to use the already-built Nagios package that's now available from the AIX Toolbox. I still am... even though it's taken *way* longer than I ever thought it would.

    ------------------------------
    Erich Wolz
    ------------------------------



  • 86.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Sun October 30, 2022 12:14 PM
    This article may help you. There is section for building packages using SRPM
    Sorry the formatting of that section is messed up but if you copy and paste that on an editor you will get it. 
    https://developer.ibm.com/articles/au-aix-build-open-source-rpm-packages/

    ------------------------------
    SANKET RATHI
    ------------------------------



  • 87.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Mon October 31, 2022 07:23 PM
      |   view attached

    > When you install an SRPM package, the spec file gets copied to the SPECS directory (see directory structure shown in Figure 1). Change to the SPECS directory and run the `rpmbuild -ba < package-name.spec>` command to build the software. This command generates the binary RPM package in the appropriate directory and also creates a new SRPM file.

    OK, so the first time I ran cd /opt/freeware/src/packages/SPECS; rpmbuild -ba ./nagios-4.4.7-1.spec I was prompted to install several devel packages I didn't already have installed (namely: httpd-devel, libgd-devel, libtool-ltdi-devel, patch, and their dependencies). The second time, I again got the warnings about "implicit declaration of function 'asprintf'; did you mean 'wsprintf'?" that I got above (see attached rpmbuild.out).

    That said, I do seem to have gotten a slight bit further: instead of getting an error trying to run "make -C" I appear to have been able to run "gmake -C" (at least, until I got to "ERROR: Undefined symbol: .OPENSSL_init_ssl" and "ERROR: Undefined symbol: .TLS_method").

    The googles tell me, "The OPENSSL_init_ssl() function was added in OpenSSL 1.1.0" -- however I am well past that point (I obtained my openssl from https://www.ibm.com/resources/mrs/assets/DirectDownload?source=aixbp&lang=en_US):

    # lslpp -l openssl.*
    Fileset Level State Description
    ----------------------------------------------------------------------------
    Path: /usr/lib/objrepos
    openssl.base 1.1.2.1201 COMMITTED Open Secure Socket Layer
    openssl.license 1.1.2.1201 COMMITTED Open Secure Socket License
    openssl.man.en_US 1.1.2.1201 COMMITTED Open Secure Socket Layer

    Path: /etc/objrepos
    openssl.base 1.1.2.1201 COMMITTED Open Secure Socket Layeropenssl.base 1.1.2.1201 COMMITTED Open Secure Socket Layer

    # ls -al /usr/lib/libcrypto.a /usr/lib/libssl.a
    -r-xr-xr-x 1 root system 43996983 Jun 17 02:22 /usr/lib/libcrypto.a
    -r-xr-xr-x 1 root system 10277633 Jun 17 02:23 /usr/lib/libssl.a
    # ar -tv /usr/lib/libcrypto.a
    rw-r--r-- 0/0 3072428 Jun 10 14:00 2022 libcrypto.so
    rw-r--r-- 0/0 2186744 Jun 10 14:00 2022 libcrypto.so.0.9.8
    rw-r--r-- 0/0 3072428 Jun 10 14:00 2022 libcrypto.so.1.0.0
    rw-r--r-- 0/0 3072428 Jun 10 14:08 2022 libcrypto.so.1.0.2
    rwxr-xr-x 0/0 4463281 Jun 10 14:08 2022 libcrypto.so.1.1
    # ar -tv /usr/lib/libssl.a
    rw-r--r-- 0/0 728674 Jun 10 14:00 2022 libssl.so
    rw-r--r-- 0/0 510766 Jun 10 14:00 2022 libssl.so.0.9.8
    rw-r--r-- 0/0 728674 Jun 10 14:00 2022 libssl.so.1.0.0
    rw-r--r-- 0/0 728674 Jun 10 14:08 2022 libssl.so.1.0.2
    rwxr-xr-x 0/0 1030429 Jun 10 14:08 2022 libssl.so.1.1
    #

    My next thought was that maybe the Nagios build process might want me to build *everything* it needs from source (something I am certainly hoping to *not* have to do), but "yum install openssl" says there's nothing to do.



    ------------------------------
    Erich Wolz
    ------------------------------

    Attachment(s)

    out
    rpmbuild.out   28 KB 1 version


  • 88.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    IBM Champion
    Posted Wed November 02, 2022 09:17 AM
    By installing the *-devel* RPMs you got all of the headers and any additional libraries (maybe ones that are only static) that would be needed from that particular package. So you shouldn't need to rebuild anything else from source unless *that* package is defective in some way and you have to change something. That is unlikely, but not impossible.

    See if you can tell from the SPEC file if the build process is using the AIX-supplied OpenSSL or the Toolbox-supplied version. I think it was in another thread that it was the Toolbox maintainers intentions to use the AIX-supplied version, but I may have mis-interpreted that. In any case, figuring out which one you trying to use now, and then trying the other one, might be helpful.

    Also, I'm trying not to loose sight of the fact that your original problem was one of Nagios not loading its config file. You might also want to look at the error messages you got from that, and see if just browsing the code near those messages would help understand what happened. You might not need to rebuild anything, but change where the config file is or it's permissions or something.

    I know you took a stab at that already, but sometimes a package as big as Nagios tries to do clever things to help maintain security, and they turn out to be non-obvious to someone who just wants it to work. Even if you don't program in (I'm assuming) C, you can probably learn more about how Nagios expects things to be setup. You wouldn't be looking for a *mistake* in the C code, just a general idea of how the Nagios config system works and locates files. You can also look for constructs that might shed permissions, etc. Also, now that you have the source code hanging out, you might be able to run the other Nagios binary in a debugger with source cross-references, so you can follow the problem live.

    You've now got at least three or four different ways to pursue the problem, all of which are valid (including trying to rebuild it from source like you are doing right now). My personal proclivity would be to try and match the error message inside the source code and see what is happening there, but that's just the way I tend to work on not-my-code. :) My second choice would be to rebuild it as you are doing, and see how it behaves. If you rebuild it and it just works, it's time for a conversation with the Toolbox maintainers about the differences in the build environment.

    Hey, Toolbox Maintainers, I know you're crazy-busy, but could you check if the current SRPM builds in the current Toolbox build environment? I'm not asking you to distribute a new package, but to see if the drifting build system has made the current Nagios package not work. If the build breaks, it is possible (likely?) that the Toolbox-based runtime collection of packages has also drifted enough that this warrants deeper tinkering. At least at that point we'll know how much effort Erich should put into his re-build, and how suspect his build environment is.

    -- 
    Stephen L. Ulmer
    Enterprise Architect
    Mainline Information Systems








  • 89.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Wed November 02, 2022 03:13 PM
    Sure.. we can check that. And as it is generated from AIX toolbox build environment I doubt it will fail. I think there are some build packages missing in Erich's environment.

    ------------------------------
    SANKET RATHI
    ------------------------------



  • 90.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Wed November 02, 2022 03:14 PM
    Sure.. we can check that. And as it is generated from AIX toolbox build environment I doubt it will fail. I think there are some build packages missing in Erich's environment.

    ------------------------------
    SANKET RATHI
    ------------------------------



  • 91.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Wed November 02, 2022 03:20 PM
    Sure.. we can check that. And as it is generated from AIX toolbox build environment I doubt it will fail. I think there are some build packages missing in Erich's environment.

    ------------------------------
    SANKET RATHI
    ------------------------------



  • 92.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu November 03, 2022 09:29 AM

    "ERROR: Undefined symbol: .OPENSSL_init_ssl" and "ERROR: Undefined symbol: .TLS_method"
    Can you check whether the machine has openssl rpm installed ? If so , remove it. 
    Also check whether it has libssl.a ( & libcrypto.a ) present in /opt/freeware/lib (or lib64) ? If so remove it and try the build again. 



    ------------------------------
    Ayappan P
    ------------------------------



  • 93.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu November 03, 2022 09:12 PM
    Yes, the original problem was one of Nagios not being able to open its main config file (recall that nagios was -- and is -- started via "/opt/freeware/bin/nagios -d /etc/nagios/nagios.cfg > /dev/console 2>&1" but nagios was instead trying to read the socket specified to be used for the query handler interface "/var/nagios/rw/nagios.qh"; this issue was fixed in nagios-4.4.6-2). Ever since, the problem was one of Nagios unable to send host/service checks because the workers are being removed (at least, on my machine).

    Back in April or so, it was established that the only major differences between our configs was:
    a) processes(per user) -- my "unlimited" to the toolbox team's "128"
    b) oslevel -- my "7200-05-03-2135" to the toolbox team's "7200-00-01-1543" (AIX 7.2 TL0 EoSPS was 31 Dec 2018, and ITSS requires us to maintain software and systems at a vendor-supported level, so I am not at liberty to revert to 7200-00-01-1543)
    c) platform firmware level -- my "SV860_236" to your "TV840_028" (which I don't even see in Fix Central, but which in any event appears to be downlevel judging from the fact that the most recent level of SV840 is 177 not 028)
    d) Full Core -- my "false" to the toolbox team's "true"

    Turns out, I did have an openssl rpm installed... but it's gone now:

    [root@dl1axmon.houston.ibm.com:/] >rpm -qa | grep -i openssl
    openssl-0.9.8o-1.ppc
    [root@dl1axmon.houston.ibm.com:/] >rpm -e openssl-0.9.8o-1
    warning: /var/ssl/openssl.cnf saved as /var/ssl/openssl.cnf.rpmsave
    warning: file /usr/linux/bin/c_rehash: remove failed: A file or directory in the path name does not exist.
    [root@dl1axmon.houston.ibm.com:/] >rpm -qa | grep -i openssl
    [root@dl1axmon.houston.ibm.com:/] >cd /opt/freeware
    [root@dl1axmon.houston.ibm.com:/opt/freeware] >find . -name libssl.a -ls
    [root@dl1axmon.houston.ibm.com:/opt/freeware] >find . -name libcrypto.a -ls
    [root@dl1axmon.houston.ibm.com:/opt/freeware] >

    When I re-initiate the build, I still get all of the warnings that were reported earlier, but I no longer get any undefined symbol errors... those were replaced with:

    gcc -maix64 -O2 -I.. -g -O2 -I/usr/include/openssl -I/opt/freeware/include -DHAVE_CONFIG_H -DNSCORE -o nagios nagios.o broker.o nebmods.o ../common/shared.o query-handler.o workers.o checks.o config.o commands.o events.o flapping.o logging.o macros-base.o netutils.o notifications.o sehandlers.o utils.o retention-base.o xretention-base.o comments-base.o xcomments-base.o objects-base.o xobjects-base.o statusdata-base.o xstatusdata-base.o perfdata-base.o xperfdata-base.o downtime-base.o -Wl,-bexpall,-brtl -L/opt/freeware/lib64 -L/opt/freeware/lib -Wl,-blibpath:/opt/freeware/lib64:/opt/freeware/lib:/usr/lib:/lib -L/usr/lib -L/opt/freeware/lib -lm -lltdl -lssl -lcrypto ../lib/libnagios.a
    ld: 0711-738 ERROR: Input file /usr/lib/libssl.so:
    XCOFF32 object files are not allowed in 64-bit mode.
    collect2: error: ld returned 8 exit status
    gmake[1]: *** [Makefile:157: nagios] Error 1
    gmake[1]: Leaving directory '/opt/freeware/src/packages/BUILD/nagios-4.4.7/base'
    gmake: *** [Makefile:88: all] Error 2
    error: Bad exit status from /var/tmp/rpm-tmp.GyDqeb (%build)

    Again, not being a developer, I don't know what causes this (or how to fix it)... but here's what's in my /usr/lib:

    -r-xr-xr-x 1 root system 10277633 Jun 17 02:23 /usr/lib/libssl.a
    -rwxr-xr-x 1 root system 729439 Feb 02 2018 /usr/lib/libssl.so
    lrwxrwxrwx 1 root system 26 Jul 29 12:47 /usr/lib/libssl3.a -> /usr/opt/rpm/lib/libssl3.a
    lrwxrwxrwx 1 root system 27 Jul 29 12:47 /usr/lib/libssl3.so -> /usr/opt/rpm/lib/libssl3.so
    -r-xr-xr-x 1 root system 3400639 Feb 20 2019 /usr/lib/libssl_compat.a​

    ------------------------------
    Erich Wolz
    ------------------------------



  • 94.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Fri November 04, 2022 03:22 AM
    The file "/usr/lib/libssl.so" should not be there. Packages from Toolbox or the AIX openssl doesn't provide this file. 
    Remove it. Also remove libcrypto.so if it is present there as well.

    ------------------------------
    Ayappan P
    ------------------------------



  • 95.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Fri November 04, 2022 09:53 AM

    After removing /usr/lib/libssl.so (and /usr/lib/libcrypto.so which was also present) the build ran to completion and generated the following files in /opt/freeware/src/packages/RPMS/ppc:

    nagios-4.4.7-1.aix7.2.ppc.rpm 
    nagios-devel-4.4.7-1.aix7.2.ppc.rpm
    nagios-gui-4.4.7-1.aix7.2.ppc.rpm

    Refreshing the installed nagios and nagios-gui packages with these and launching nagios *not* as a daemon, but in the current command window, resulted in the same behavior reported in early September, namely:

    Nagios Core 4.4.7
    Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
    Copyright (c) 1999-2009 Ethan Galstad
    Last Modified: 2022-04-14
    License: GPL

    Website: https://www.nagios.org
    Nagios 4.4.7 starting... (PID=6619538)
    Local time is Fri Nov 04 07:45:40 CDT 2022
    wproc: Successfully registered manager as @wproc with query handler
    wproc: Registry request: name=Core Worker 10813718;pid=10813718
    wproc: Registry request: name=Core Worker 11534700;pid=11534700
    wproc: Registry request: name=Core Worker 6357318;pid=6357318
    wproc: Registry request: name=Core Worker 10682728;pid=10682728
    wproc: Registry request: name=Core Worker 11338206;pid=11338206
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    Worker failed to create io broker socket set: No such file or directory
    wproc: Registry request: name=Core Worker 11075852;pid=11075852
    wproc: Registry request: name=Core Worker 11010414;pid=11010414
    wproc: Registry request: name=Core Worker 11272550;pid=11272550
    wproc: Registry request: name=Core Worker 11141472;pid=11141472
    wproc: Registry request: name=Core Worker 8716646;pid=8716646
    (etc.)

    Unfortunately there is no indication as to what file or directory is being looking for.



    ------------------------------
    Erich Wolz
    ------------------------------



  • 96.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Thu November 10, 2022 11:50 AM
    > Unfortunately there is no indication as to what file or directory is being looking for.

    The "Worker failed to create io broker socket set: No such file or directory" messages are the same as the behavior I saw using the Toolbox-provided RPMs... so at least that much is consistent.  


    ------------------------------
    Erich Wolz
    ------------------------------



  • 97.  RE: Nagios 4.6 server on AIX / Cannot open main configuration file '/var/nagios/rw/nagios.qh

    Posted Tue December 13, 2022 04:00 AM

    As noted above, I have (unsuccessfully) tried running the version of Toolbox Nagios 4.4.7.1 downloaded as an RPM, and I have also (unsuccessfully) tried running the version of Toolbox Nagios 4.4.7-1 built from the SPEC file.  Also as suggested above, I tried reaching out on github, only to be redirected back here.  Finally, I did search through the source code and find the "Worker failed to create io broker socket set" message -- in lib/worker.c on line 784 -- but I am not a developer so the surrounding code doesn't tell me anything (other than I'm not the only one who thinks this needs to be handled a bit better):

    fcntl(fileno(stdout), F_SETFD, FD_CLOEXEC);
    fcntl(fileno(stderr), F_SETFD, FD_CLOEXEC);
    fcntl(master_sd, F_SETFD, FD_CLOEXEC);
    iobs = iobroker_create();
    if (!iobs) {
    /* XXX: handle this a bit better */
    exit_worker(EXIT_FAILURE, "Worker failed to create io broker socket set");
    }

    Yum doesn't tell me that my Nagios RPMs need to be updated, but I see that Nagios Core has been at 4.4.9 since mid-November.  Any plans to update the Toolbox version of Nagios?  Believe me, I would like to be able to put this behind me as much as anyone else who has posted in this thread!



    ------------------------------
    Erich Wolz
    ------------------------------