Monitoring VMware ESXi and vSphere with Nagios

This article describes how to monitor a VMWare ESXi or vSphere host with Nagios, using the OP5 Check ESX Plugin written in PERL.

The plugin can monitor either a single ESXi/vSphere server or a VirtualCenter/vCenter Server and individual virtual machines. We’ll see here how to monitor an ESXi 4 host.

The following tutorial has been made on a CentOS server, you may have to adapt some paths with other distributions.

Installation

The prerequisite for the plugin to work is to install the VMWare Perl SDK available on the manufacturer website.
Download the file on your server, for example in the root directory, untar it and run the installer that way :

# cd /root
# tar xvzf VMware-vSphere-Perl-SDK-4.1.0-254719.i386.tar.gz 
# cd vmware-vsphere-cli-distrib/
# ./vmware-install.pl

Follow the instructions given by the script. Depending on your setup, some PERL dependencies must be installed prior for the SDK to work correctly. When it’s done, we can get the plugin here, and copy it to /usr/lib/nagios/plugins/. Make it executable :

# cd /usr/lib/nagios/plugins/
# chmod a+x check_esx



Configuration

Now, we can start the real configuration for Nagios. We’ll need a username and password to access the ESXi host, let’s define those Nagios variables in a safe place in /etc/nagios/resource.cfg, so that this information will be hidden from the CGIs :

$USER11$=username
$USER12$=password

In this tutorial, we’ll be monitoring these resources : CPU, memory usage, net usage, runtime status and IO/read/write. But some more are available, see the references here. Below are the new commands related to ESXi to add in the /etc/nagios/objects/command.cfg file (these are the ESXi related commands only, NOT the full command.cfg, you may append this at the end of the file) :

# check vmware esxi machine
# check cpu
define command{
        command_name check_esx_cpu
        command_line $USER1$/check_esx -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l cpu -s usage -w $ARG1$ -c $ARG2$
        }

# check memory usage
define command{
        command_name check_esx_mem
        command_line $USER1$/check_esx -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l mem -s usage -w $ARG1$ -c $ARG2$
        }

# check net usage
define command{
        command_name check_esx_net
        command_line $USER1$/check_esx -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l net -s usage -w $ARG1$ -c $ARG2$
        }

# check runtime status
define command{
        command_name check_esx_runtime
        command_line $USER1$/check_esx -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l runtime -s status
        }

# check io read
define command{
        command_name check_esx_ioread
        command_line $USER1$/check_esx -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l io -s read -w $ARG1$ -c $ARG2$
        }

# check io write
define command{
        command_name check_esx_iowrite
        command_line $USER1$/check_esx -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l io -s write -w $ARG1$ -c $ARG2$
        }

And an example of the configuration for a Nagios host called esxi01 in /etc/nagios/hosts/esxi01.cfg :

# Host esx01
define host{
        use                     linux-server
        host_name               esxi01
        alias                   VMWare ESXi 01
        address                 192.168.1.100
        }

# Define a service to "ping" the local machine
define service{
        use                             generic-service
        host_name                       esxi01
        service_description             PING
        check_command                   check_ping!100.0,20%!500.0,60%
        }

# VMWare
# check cpu
define service{
        use                             generic-service
        host_name                       esxi01
        service_description             ESXi CPU Load
        check_command                   check_esx_cpu!80!90
        }

# check memory usage
define service{
        use                             generic-service
        host_name                       esxi01
        service_description             ESXi Memory usage
        check_command                   check_esx_mem!80!90
        }

# check net
define service{
        use                             generic-service
        host_name                       esxi01
        service_description             ESXi Network usage
        check_command                   check_esx_net!102400!204800
        }

# check runtime status
define service{
        use                             generic-service
        host_name                       esxi01
        service_description             ESXi Runtime status
        check_command                   check_esx_runtime
        }

# check io read
define service{
        use                             generic-service
        host_name                       esxi01
        service_description             ESXi IO read
        check_command                   check_esx_ioread!40!90
        }

# check io write
define service{
        use                             generic-service
        host_name                       esxi01
        service_description             ESXi IO write
        check_command                   check_esx_iowrite!40!90
        }

It’s done. Restart Nagios and wait a while (or re-schedule) for the new resources to be monitored.

Nagios

28 pensées sur “Monitoring VMware ESXi and vSphere with Nagios”

  1. Can’t install from OSSIM CD Live

    e2fsprogs
    openssl-devel
    complaints.

    Can’t install those from aptitude, neither compile e2fsprogs.

    Also, it is required op5 monitor ?

    regards

  2. Thanks a lot for plugin. Works like a charm on Debian Squeeze + ESXi4 and ESX5.
    Some tricks to install perl SDK:

    – error with http_proxy and ftp_proxy not set is solved by:
    export http_proxy=
    export ftp_proxy=

    – error with echo /etc/*-release not found solved by:
    echo ubuntu > /etc/*-release

    in plugin I had to add a row:
    $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0;
    to avoid problem with unsigned certificate.

    Hope this helps someone :)

  3. Anyone else get CHECK_VMWARE_API.PL UNKNOWN – plugin timed out (timeout 30s)? I had v5 of the SDK and backed it down to v4, but still no luck.

  4. I installed vmware perl sdk and wheni run from the command prompt, it works fine. After configuring in the command.cfg and service and host definition . I am getting the following error messages in nagios console
    Current Status: CRITICAL (for 0d 1h 39m 17s)
    Status Information: (Service check did not exit properly)

    1. Add /usr/bin/perl before $USER1$/check_esx in your command_line

      it gives: /usr/bin/perl $USER1$/check_esx -H …

  5. I’ve got the following error:
    CHECK_VMWARE_ESXI.PL CRITICAL – SOAP request error – possibly a protocol issue: … [xml skipped]

    What should I do? I’ve got VM SDK Perl ver 5.5.0 – should I downgrade to 5.1.0?

    1. Hi there.
      You have this error « “Return code of 127 is out of bounds – plugin may be missing” because the plugin is missing. Please check in yours /usr/lib/nagios/plugins , or /usr/lib64/nagios/plugins (for x64 distro) for this file: check_esx
      If this file missing, you can rename/copy check_vmware_api.pl from check_vmware_api directory

  6. Well not working on Nagios :

    I have this error :

    SERVICE ALERT: esxDemo;ESXi Network usage;CRITICAL;SOFT;1;(Service check did not exit properly)

    but when i lunch the script i have a good result :

    user1@nagios:/usr/lib/nagios/plugins$ /usr/lib/nagios/plugins/check_vmware_api.pl -H 192.168.1.9 -u super-p super -l cpu -w 80 -c 90
    CHECK_VMWARE_API.PL OK – cpu usage=60.00 MHz (0.80%) | cpu_usagemhz=60.00MHz;80;90 cpu_usage=0.80%;80;90

    someone know why nagios don’t want see the result ?
    on syslog i have this :

    Jul 16 19:59:55 nagios nagios3: Warning: Check of service ‘ESXi CPU Load’ on host ‘esxDemo’ did not exit properly!
    Jul 16 19:59:55 nagios nagios3: SERVICE ALERT: esxDemo;ESXi CPU Load;CRITICAL;SOFT;3;(Service check did not exit properly)
    Jul 16 19:59:55 nagios nagios3: Warning: Check of service ‘ESXi Network usage’ on host ‘esxDemo’ did not exit properly!
    Jul 16 19:59:55 nagios nagios3: SERVICE ALERT: esxDemo;ESXi Network usage;CRITICAL;SOFT;3;(Service check did not exit properly)
    Jul 16 20:00:15 nagios nagios3: Warning: Check of service ‘ESXi IO read’ on host ‘esxDemo’ did not exit properly!
    Jul 16 20:00:15 nagios nagios3: SERVICE ALERT: esxDemo;ESXi IO read;CRITICAL;SOFT;2;(Service check did not exit properly)
    Jul 16 20:00:15 nagios nagios3: Warning: Check of service ‘ESXi Runtime status’ on host ‘esxDemo’ did not exit properly!
    Jul 16 20:00:15 nagios nagios3: SERVICE ALERT: esxDemo;ESXi Runtime status;CRITICAL;SOFT;2;(Service check did not exit properly)

  7. HI ,

    How to monitor datastores using this plugin. When i tried using the vmfs option it always shows ok no matter what the size is.

    Please help

    Regards
    S. prabhu

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *