Linux : Nagios Plugins

A collection of nagios service check scripts and nagios check command addition to push and run the plugins over ssh.

For the uninitiated, Nagios is a host and service monitoring/notification system well suited for both small and large cluster use.

Motivation

These scripts were written to eliminate as much from the client installation process as possible. The Nagios Service Check Acceptor(NSCA) was initially used to push host status from clients to our central Nagios server. However, given our heterogenous environment of several architectures, any imaginable version of RedHat, and a few windows clients running cygwin and sshd, the administration costs of compiling and configuring each client quickly became too much. It also seems that NSCA is scarcely maintained and numerous difficulties were encountered with unpredictable delays in accepting the reported status. Lastly, the NSCA model distributes service threshold configuration to the client end, requiring changes at the client for every adjustment. The Nagios Remote Plugin Executor(NRPE) is able to solve some of the distributed configuration issues, but still requires the nagios plugin to already exist on the client end. This obviously precludes easy updating of clients across all monitored machines. Finally, given our security desires, listening on another port at every client is sub-optimal.

Key Benefits

By rewriting the service plugins in python to avoid platform and architecture specificity and executing them over ssh, we have achieved several goals:

Centralized distribution/upgrades of plugins
Centralized configuration of warn/crit thresholds for each service
Platform/architecture independence
Simplified client integration: A single user must be created with a home directory and given ./ssh/authorized_hosts2 file.
Use of existing network infrastructure: All of our machines listen for ssh already.

Potential Downfalls

Initially, I was concerned that the additional load imposed by initiating an ssh session for every service check would exceed the available nagios server resources. The server is actually one of a few User Mode Linux machines running on top of a 2.5Ghz Celeron. A secondary concern is raising the load of the client machine with ssh key negotiation. In practice, checking about 40 hosts, each with 4 services, at 5 minute intervals, has presented no performance issues. It is possible further improvements such as combining service checks into a single ssh session could be utilized if performance were to suffer dramatically on a larger network.

Implementation

All plugins must be installed in your nagios libexec directory. push_check.sh should be used as a wrapper when calling a plugin from services.cfg. When run by nagios, an ssh session will be initiated to the client machine, the client will be checked for the latest version (by md5sum) and updated if necessary. If everything checks out, it will be run and all output forwarded back to the nagios server as it's own output and return code. There's little checking done for catestrophic failures within the ssh session. At the very worst, you'll see something along the lines of '(no output)' in the nagios interface. For every check you wish to push via ssh, the check command in checkcommands.cfg should be modified from something like:

define command{
        command_name    check_vsz
        command_line    $USER1$/check_vsz.py -w $ARG1$ -c $ARG2$
        }

to:

define command{
        command_name    check_vsz
        command_line    $USER1$/push_check.sh /etc/nagios/id_rsa\
                        nagplug@$HOSTNAME$ $USER1$/check_vsz.py\
                        -w $ARG1$ -c $ARG2$
        }

Note the client machine username nagplug and the ssh private key /etc/nagios/id_rsa. You may easily change these to suit your needs. At the very least, you'll need to create your own private key with ssh-keygen and install it wherever this does point. Create an account on each client machine with a writable home directory and an ./ssh/authorized_keys2 containing the corresponding public key.

The normal behavior of ssh is to add the client machine key to your ./ssh/known_hosts. I don't yet know a way of automatically accepting the key without some hack to type 'yes' to it. Even if that were possible, the nagios server user is not able to write to its home directory. This greaty confuses ssh. You'll notice in the push script that I turn off StrictHostKeyChecking, which seems to work when the home directory is not writable. If it is made writable, I believe this fails, however.

As for the plugins themselves, I have largely conformed to the same syntax as the standard C nagios plugins, but have not implemented all options. It is possible there are slight variations, so double check if in doubt. Due to large differences in Python versions among our client base, I've written all of the plugins to conform to Python 1.5.2.

Download

This software is licensed under the GNU General Public License, version 2 or greater. See license.txt for details.

push_check.sh
check_disk.py
check_load.py
check_procs.py
check_vsz.py
nagios-push-ssh Gentoo ebuild
license.txt

A tarball of all files above: nagios-push-ssh-all.tar.gz.

Created: 07 Jul 2005
Last Modified: 17 Nov 2009