Nagios: Fixing "error: Could not stat() command file" (on Debian)
Nagios is a network monitoring tool. I use it to track web servers, mail servers, and whatever else I have running on the LAN and on the Internet.
One common configuration issue is getting the Service Commands menu to work correctly. By default, it is visible in the UI, but disabled on the server backend. And on Debian, not all of the steps to enable it are particularly evident from the docs. Often, one will recieve the cryptic error Could not stat() command file pointing to /var/lib/nagios3/rw/nagios.cmd. This can be fixed without too much fuss.
<!--break--> I have run into this problem the last three times I have set up Nagios, and every time I fix it, I forget to document the process. So here's the documentation. The first few points are covered in the official Nagios documentation that comes with your package. The later points are not, and have more to do with OS configuration.My configuration:
- Debian 5.3
- Nagios 3
Configure nagios.cfg
Make sure you have something like this in /etc/nagios3/nagios.cfg
:
# EXTERNAL COMMAND OPTION
# Values: 0 = disable commands, 1 = enable commands
check_external_commands=1
# EXTERNAL COMMAND CHECK INTERVAL
# NOTE: Setting this value to -1 causes Nagios to check the external
# command file as often as possible.
command_check_interval=15s
#command_check_interval=-1
# EXTERNAL COMMAND FILE
command_file=/var/lib/nagios3/rw/nagios.cmd
You will need to restart Nagios for these settings to be loaded.
Configure cgi.cfg
Next, edit the /etc/nagios3/cgi.cfg
file and check the following:
# SYSTEM/PROCESS COMMAND ACCESS
authorized_for_system_commands=nagiosadmin
# ... Other stuff cut
# GLOBAL HOST/SERVICE COMMAND ACCESS
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin
This enables the user nagiosadmin to submit commands. You can check out the in-file documentation for setting up multiple users.
Restart Apache2 after you have done this.
Add www-data to the Nagios Group
Next, make sure that the user www-data is in the group nagios
. You can check on this in /etc/group
.
Set the Permissions
This last item is the tricky one. You need to set the correct permissions for the command pipe file (this is the file set in nagios.cfg). Nagios uses this pipe to pass data from the CGI to the backend daemon.
This file is located in /var/lib/nagios3/rw/nagios.cmd
.
By default, it should have the following permissions:
# ls -l /var/lib/nagios3/rw/nagios.cmd
prw-rw---- 1 nagios nagios 0 2010-01-13 10:17 /var/lib/nagios3/rw/nagios.cmd
Notice that the group ownership is nagios
. That's why you added www-data
to the nagios
group above.
But even with these permissions, you may (will?) still get an error. The reason for this is that the parent directory, rw
, does not allow any user but nagios
to access its contents:
# ls -lh /var/lib/nagios3
total 92K
-rw-r--r-- 1 nagios nagios 33K 2010-01-04 17:06 objects.precache
-rw------- 1 nagios www-data 48K 2010-01-13 10:17 retention.dat
drwx------ 2 nagios www-data 4.0K 2010-01-13 10:17 rw
drwxr-x--- 3 nagios nagios 4.0K 2010-01-04 16:00 spool
All you need to do to fix this is add the execute bit (x
) to the rw
directory:
# chmod g+x /var/lib/nagios3/rw
Now any member of the nagios group (including www-data
) should be able to access the contents of the rw
directory.
At this point, you should be able to execute Nagios Service Commands through the web interface. (If not, try doing a stop/start of nagios and try again).