Good Nagios Parenting, Avoids a Noisey Pager

Monitoring configuration is complicated, and the depths that you can configure alerts and tests seems endless. It may seem like a waste of time to invest in some options, but others can really help you eliminate states that send hundreds of alerts. Your end goal in your configuration is to narrow down any alert sent to the pager to be immediately actionable, and that all other issues are ignored. Certain Failure states like failed switches, routers, can cause a flood of alerts since they take down the network infrastructure, and obscure the true cause of an outage.

Defining the Right Config

The first step you can take to prevent a flood of pages is to define all you routers, switches, and other network equipment in your Nagios config. After you have that defined you simply need to define a parent on the config object.
For example:

# Primary Switch in VRRP Group
define host {
use switch
address 10.0.0.2
host_name switch-1
hostgroups switches
}

#Secondary Switch in VRRP Group
define host {
use switch
address 10.0.0.3
host_name switch-2
hostgroups switches
}

define host {
use server
address 10.0.0.100
host_name apache-server-1
hostgroups servers, www
parents switch-1, switch-2
}

This will configure the host apache-server-1 such that if switch-1 and switch-2 fail, alerts will be silence from the client. The alerts will remain off until either switch-1 or switch-2 becomes available again.

A Few Things to Keep in Mind

Nagios is pretty smart, and can handle multiple parents so that alerts will only be silenced if both parents become unavailable.

The availability of parent hosts is determined by the host health check, most commonly ping. If you need some other test of availability, make sure to define this in the host object.

Parent all the objects you can or that make sense to parent. For example, a router or transport failure at a remote data center should only send a single alert. This means you should define your routers, switches, and possibly your providers gateways. Do whatever you think makes sense, and take it as far as your can. Remember your goal is to make the number of alerts manageable, so the better you define the topology the less likely you are to get a useless page, or several hundred useless pages.


Posted

in

,

by

Tags: