I had a hell of a time configuring Munin to send out e-mail alerts if values surpass specific thresholds. Many of the articles I found focused just on setting up the email command (which was the easy part), while few told me *how* to configure the per-service thresholds.
Once the thresholds are configured, you’ll see a green line for the warning threshold and a blue line for the critical one, like in this graph:
Some of Munin’s plugins already have configured thresholds (such as disk space monitoring which will send a warning at 92% usage and a critical alert at 96% or so). But others don’t, and I wanted to keep an eye on e.g. system load, network throughtput and outgoing e-mail.
The mail command can be configured in /etc/munin-conf.d/alerts.conf:
contact.myname.command mail -s "Munin ${var:group} :: ${var:host}" thisisme@somewhere.com
Next in /etc/munin.conf, under the specific host I want to receive alerts for, I did something like:
[www.myserver.com] address 127.0.0.1 use_node_name yes postfix_mailvolume.volume.warning 100000 load.load.warning 1.0 load.load.critical 5.0 df._dev_sda1.warning 60
This will send alert if the postfix plugin’s volume surpasses 100k, if the load plugin’s load values surpass 1.0 or 5.0 (warning and critical, respectively) and if df plugin’s _dev_sda1 value is over 60% (this is disk usage).
Now here’s the tricky part: How to figure out what the plugin name is, and what the value from this plugin is? (if you get these wrong, you’ll get the dreaded UNKNOWN is UNKNOWN alert).
Just look in /etc/munin/plugins for the one that monitors the service you want alerts for. Then run it with munin-run, for example, for the memory plugin:
$ sudo munin-run memory slab.value 352796672 swap_cache.value 6959104 page_tables.value 8138752 vmalloc_used.value 102330368 apps.value 413986816 free.value 120274944 buffers.value 215904256 cached.value 4964200448 swap.value 28430336 committed.value 962179072 mapped.value 30339072 active.value 2746691584 inactive.value 2787188736
These are the values you have to use (so memory.active.warning 500000000 will alert if active memory goes about 5GB).
A tricky one is diskstats:
# munin-run diskstats multigraph diskstats_latency sda_avgwait.value 0.0317059353689672 sdb_avgwait.value 0.00127923627684964 sdc_avgwait.value 0.00235443037974684 multigraph diskstats_utilization sda_util.value 6.8293650462148 sdb_util.value 0.000219587438166445 sdc_util.value 0.000150369658744413
In this case, use diskstats_utilization.sda_util.warning (so the value in “multigraph” is used as if it were the plugin name).
diskstats_utilization.sda_util.warning 60