How to configure e-mail alerts with Munin

I had a hell of a time configuring Munin to send out e-mail alerts if values surpass specific thresholds. Many of the articles I found focused just on setting up the email command (which was the easy part), while few told me *how* to configure the per-service thresholds.

Once the thresholds are configured, you’ll see a green line for the warning threshold and a blue line for the critical one, like in this graph:

munin-it

Some of Munin’s plugins already have configured thresholds (such as disk space monitoring which will send a warning at 92% usage and a critical alert at 96% or so). But others don’t, and I wanted to keep an eye on e.g. system load, network throughtput and outgoing e-mail.

The mail command can be configured in /etc/munin-conf.d/alerts.conf:

contact.myname.command mail -s "Munin ${var:group} :: ${var:host}" thisisme@somewhere.com

Next in /etc/munin.conf, under the specific host I want to receive alerts for, I did something like:

[www.myserver.com]
    address 127.0.0.1
    use_node_name yes
    postfix_mailvolume.volume.warning 100000
    load.load.warning 1.0
    load.load.critical 5.0
    df._dev_sda1.warning 60

This will send alert if the postfix plugin’s volume surpasses 100k, if the load plugin’s load values surpass 1.0 or 5.0 (warning and critical, respectively) and if df plugin’s _dev_sda1 value is over 60% (this is disk usage).

Now here’s the tricky part: How to figure out what the plugin name is, and what the value from this plugin is? (if you get these wrong, you’ll get the dreaded UNKNOWN is UNKNOWN alert).

Just look in /etc/munin/plugins for the one that monitors the service you want alerts for. Then run it with munin-run, for example, for the memory plugin:

$ sudo munin-run memory 
slab.value 352796672
swap_cache.value 6959104
page_tables.value 8138752
vmalloc_used.value 102330368
apps.value 413986816
free.value 120274944
buffers.value 215904256
cached.value 4964200448
swap.value 28430336
committed.value 962179072
mapped.value 30339072
active.value 2746691584
inactive.value 2787188736

These are the values you have to use (so memory.active.warning 500000000 will alert if active memory goes about 5GB).

A tricky one is diskstats:

# munin-run diskstats
multigraph diskstats_latency
sda_avgwait.value 0.0317059353689672
sdb_avgwait.value 0.00127923627684964
sdc_avgwait.value 0.00235443037974684

multigraph diskstats_utilization
sda_util.value 6.8293650462148
sdb_util.value 0.000219587438166445
sdc_util.value 0.000150369658744413

In this case, use diskstats_utilization.sda_util.warning (so the value in “multigraph” is used as if it were the plugin name).

diskstats_utilization.sda_util.warning 60