As easy as real servers today can be converted into virtual machines, as fast as the overview is lost. With special monitoring adapted for virtual environments, you keep track of things.
Virtualization is a technology that has been in the data center for many years and has its origins in host partitioning in the mainframe environment. Already over 35 years ago, IBM has laid a significant foundation here with various mainframe LPARs. Today, virtualization is an indispensable part of the data center, and many modern architectural concepts such as cloud computing would be inconceivable without this basic technology.
Because new virtual machines can be rolled out quickly and easily, the basic principles of good IT management in everyday life sometimes fall behind. For example, the necessary entries in the Configuration Management Database (CMDB) are forgotten, no documentation is created, or the admission to monitoring is postponed to later.
Requirements for IT service management
Especially the monitoring of the available resources and compliance with the agreed service levels should be a core competence of IT operations and should also deal flexibly with the new requirements. For the free monitoring systems Nagios and Icinga, there are a variety of plug-ins to monitor the available virtualization solutions and also to detect long-term capacity bottlenecks. This article introduces solutions to monitor virtualized environments and identify changes.
What makes a virtualized system special?
First of all, the question arises as to what makes a virtual system so special. After all, the fixed allocation of resources to a virtualized system seems to require the same basic monitoring as usual. But what is usually, what should really be monitored at the operating system level? In order not only to detect failures, but to avoid them as far as possible, the hard disk usually offers the most potential, so the monitoring of the free space, but also of the I / Os per second should be given special attention: Nagios can with the plugin »
check_disk«Recognize disk utilization and later use the data for capacity management. A very useful extension of the disk check is the monitoring of the disk I / O. This can easily be determined with the appropriate plugin »
check_diskio« and also provided with alarm thresholds. Increased Disk I / O is often evident days or weeks before other problems, as today’s systems are more memory-bound than CPU-bound.
Although NRPE (Nagios Remote Plugin Executer) is standard in many environments, SSH should be preferred. Reasons are firewall awareness, updates by the respective operating system provider but above all the proven handling of keys and configuration in the Linux and Unix environment.
While the mentioned plugins are designed for Unix and Linux systems, the admin can also monitor load and I / O load on Windows systems. The most common way is to use the NSClient ++ agent and the corresponding server plugins »
check_nt« or »
check_nrpe«. The query is then made using the available Windows counters:
check_nrpe -H $ HOSTADDRESS $ -c CheckCounter -a "\\ System \\ File Data Operations / Sec" ShowAll MaxWarn = 20000 MaxCrit = 30000
In addition to »
check_disk«, the monitoring of certain processes also serves to avoid errors. Standard plugins like ”
check_proc” and the Checkprocstate module of NSClient ++ help. For example, on Windows it is possible to monitor the maximum number of specific processes in order to identify possible errors in good time:
check_nrpe -H IP -p 5666 -c CheckProcState -a MaxCritCount = 50 app.exe = started
In addition, monitoring systems can, of course, monitor a large number of other services and services. In the relevant portals of the German community and various wikis are countless examples, thanks to the great popularity of Nagios and Icinga.
Container or system and paravirtualization
The type of monitoring depends heavily on the virtualization solution used. While container-based technologies such as Open VZ, Linux Vserver, or Solaris Zones provide closed runtime environments without starting an additional operating system core, system and paravirtualization, as with VMware, Xen, or KVM, only releases a certain pool of resources. The host system then has to take care of their own use and controls the resources provided regardless of the virtualization environment. This also shows the big difference in possible surveillance scenarios. While containerized solutions mostly represent homogeneous system environments, paravirtualization may lead to different host systems.
Monitoring Open VZ and Solaris Zones
Open VZ can provide additional containers based on a patched Linux kernel. For each container, Open VZ creates a corresponding structure in the proc area of the Linux system. While in previous versions there were some modest areas such as ”
/proc/user_beancounters_sub” and ”
/proc/user_beancounters_debug“, in the current versions they can be found in a hierarchical structure.
Sorted by UID of the container, all counters are listed and can be easily viewed with file system commands. Based on the example plug-in in the Open VZ wiki, Robert Nelson has developed »
check_openvz« . With threshold values, the administrator can monitor the overshooting of corresponding limits, such as ”
kmemsize” and ”
numproc“, and adapt the configuration to the actual resources required. Solaris Zones can only be
zoneadmmonitored by monitoring the hosted operating system using the » « command. With ”
check_zones“a wrapper plugin is available that facilitates the call and outputs the status of the zones. An interesting alternative is the use of »
#prstat -Z # PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS / NLWP # 173 daemon 17M 11M sleep 59 0 3:18:42 0.2% rcapd / 1 # 17676 apl 6916K 3468K cpu4 59 0 0:00:00 0.1% prstat / 1 # ... # ZONEID NPROC SWAP RSS MEMORY TIME CPU ZONE $ # 0 48 470M 482M 1.5% 4:05:57 0.0% global $ # 3 85 2295M 2369M 7.2% 0:36:36 0.0% refapp1 $ # 6 74 13G 3273M 10% 16:51:18 0.0% refdb1 $ # Total: 207 processes, 709 lwps, load averages: 0.05, 0.06, 0.11 $
This allows the administrator to determine information about CPU and memory consumption in the individual virtual machines globally, access to the zone itself is not absolutely necessary. The two plug-ins for CPU and Memory are just simple shell scripts, but they can also handle warning thresholds.
Xen and KVM
While the underlying concepts of Xen and KVM are completely different, they do share the support of Libvirt. Libvirt is a Red Hat-sponsored toolkit for communicating with various virtualization technologies. In addition to Xen, KVM and Virtualbox, it also supports OpenVZ and VMWare ESX and GSX. Based on this API, »
nagios-virt« has been created to assist the user in configuring the appropriate host and service definitions for Libvirt systems.
Although the last update of the project has been around for some time, its functionality is still in place. Likewise on the basis of Libvirt and the command »
virsh« the monitoring takes place with the help of the plugin »
check_virsh« . The output of the plugin is unfortunately neither English nor German, but easy to adapt. For Xen there is additionally »
check_xenvm« . It evaluates
xm listthe appropriate Xen status using the ” ” command and outputs it formatted. If a system is being monitored at the time of the live migration, the plugin will also display it.
VMware has been on the market for over 10 years and is a veteran of virtualization. To show the different possibilities of monitoring, it is important to differentiate the different product lines. While VMware Server (formerly VMware GSX) requires its own host system based on Linux or Windows, VMware ESX Server provides its own Linux kernel, which does not need a standalone host system to expand its drivers. On top of that, if required, there is also Vcenter, which enables the central administration of several ESX servers as well as live migration (Vmotion).
The two most comprehensive plugins are »
check_esx3« and »
check_vmware3« . Both plugins access the VMware information using the VMware Perl API and can be parameterized in a variety of ways. The main difference between the two plugins is the support for heartbeat by ”
check_vmware3” to determine the exact system status and the support of regular expressions when specifying hostnames ( Figure 1 ). For example, the virtual machines of specific customers or system groups can be specifically monitored. If the appropriate VMware tools are installed in the guest system, the plugin can »
check_esx3«Get detailed performance information and measure the performance through the host system as described in the Solaris Zone. Installing a special agent in the guest is therefore superfluous.
Crucial to meaningful alerting and meaningful reporting of virtualized environments is the proper configuration and distribution of host and service objects into appropriate host and service groups. Checking the availability of a virtual machine via the host system is no substitute for extensive monitoring of the virtualized system. On the one hand, the view of the guest system is not independent of the network and can therefore lead to falsified availability results. On the other hand, the configuration of service dependencies to other services is somewhat difficult if necessary.
The correlation of performance information of a host system is really exciting. For example, influences from overlapping resource allocation, such as memory ballooning or the use of VCPUs, can be identified, and guests can be moved to other host systems if necessary. Graphical processing of the plug-in results using PNP4Nagios or the Grapher Netways also facilitates long-term analysis and allows early identification of capacity bottlenecks .
The user benefits from the use of Nagios and Icinga of probably the largest and most active open source monitoring community in the monitoring world and can therefore fall back on a variety of plugins and extensions. Especially Xen, KVM and VWware offer detailed query options via Libvirt or in-house APIs, which can easily be implemented by extending existing plugins.
The monitoring of the virtualization platform and the virtualized system should complement each other optimally, as the performance provided almost always depends on the function of the real system and not the virtualization environment. Virtualization platform monitoring, in conjunction with the other basic checks, provides a clear picture of dependencies and identifies potential bottlenecks on the host system before they can be identified in the guest system.