Nagios: Fixing "error: Could not stat() command file" (on Debian)

Jan 13 2010

Nagios is a network monitoring tool. I use it to track web servers, mail servers, and whatever else I have running on the LAN and on the Internet.

One common configuration issue is getting the Service Commands menu to work correctly. By default, it is visible in the UI, but disabled on the server backend. And on Debian, not all of the steps to enable it are particularly evident from the docs. Often, one will recieve the cryptic error Could not stat() command file pointing to /var/lib/nagios3/rw/nagios.cmd. This can be fixed without too much fuss.

<!--break--> I have run into this problem the last three times I have set up Nagios, and every time I fix it, I forget to document the process. So here's the documentation. The first few points are covered in the official Nagios documentation that comes with your package. The later points are not, and have more to do with OS configuration.

My configuration:

  • Debian 5.3
  • Nagios 3

Configure nagios.cfg

Make sure you have something like this in /etc/nagios3/nagios.cfg:

# Values: 0 = disable commands, 1 = enable commands


# NOTE: Setting this value to -1 causes Nagios to check the external
# command file as often as possible.



You will need to restart Nagios for these settings to be loaded.

Configure cgi.cfg

Next, edit the /etc/nagios3/cgi.cfg file and check the following:


# ... Other stuff cut


This enables the user nagiosadmin to submit commands. You can check out the in-file documentation for setting up multiple users.

Restart Apache2 after you have done this.

Add www-data to the Nagios Group

Next, make sure that the user www-data is in the group nagios. You can check on this in /etc/group.

Set the Permissions

This last item is the tricky one. You need to set the correct permissions for the command pipe file (this is the file set in nagios.cfg). Nagios uses this pipe to pass data from the CGI to the backend daemon.

This file is located in /var/lib/nagios3/rw/nagios.cmd.

By default, it should have the following permissions:

# ls -l /var/lib/nagios3/rw/nagios.cmd 
prw-rw---- 1 nagios nagios 0 2010-01-13 10:17 /var/lib/nagios3/rw/nagios.cmd

Notice that the group ownership is nagios. That's why you added www-data to the nagios group above.

But even with these permissions, you may (will?) still get an error. The reason for this is that the parent directory, rw, does not allow any user but nagios to access its contents:

# ls -lh /var/lib/nagios3
total 92K
-rw-r--r-- 1 nagios nagios    33K 2010-01-04 17:06 objects.precache
-rw------- 1 nagios www-data  48K 2010-01-13 10:17 retention.dat
drwx------ 2 nagios www-data 4.0K 2010-01-13 10:17 rw
drwxr-x--- 3 nagios nagios   4.0K 2010-01-04 16:00 spool

All you need to do to fix this is add the execute bit (x) to the rw directory:

# chmod g+x /var/lib/nagios3/rw

Now any member of the nagios group (including www-data) should be able to access the contents of the rw directory.

At this point, you should be able to execute Nagios Service Commands through the web interface. (If not, try doing a stop/start of nagios and try again).

comments powered by Disqus