Watch it!

Following on from my previous post, if you've ever asked "what should I use to monitor my servers" you've probably had answers like "Nagios", "Zabbix" or even "ZenOSS". What whoever it was who said that meant to say was "M/Monit".

Monit # 1

Nice simple display, and you can drill down into each host to see what services are available (or not) and indeed monitor, unmonitor, stop or start each service.

Monit # 2

There's also a nice pretty graphs section where you can get a pictorial view of what's going on without having to do too much reading .. I guess mail1 could use a little more RAM ...

Monit # 3

If you're big on SLA's, this display, although very simple is a nice representation of whether you owe your customers any compensation, or probably more to the point, what your advertising should be saying about average uptime if you don't want to get sued.

Monit # 4

And then of course there's Analytics (in each case here this is only a cut-down summary of what's available) which will let you select from a history spanning back up to 3 years.

Monit # 5

From a technical perspective, one of the major strengths here is that the data collection component of the solution is "monit" which you will have installed on your system anyway (won't you!) as it's a free/standard component that will spot problems and alert you if something goes wrong. The server site configuration takes all of around 30 seconds once your client machines are set up correctly.

Monit configurations look something like this ...

set daemon 60 with start delay 10
set logfile syslog facility log_daemon
set eventqueue basedir /var/spool/monit slots 1000
set mmonit https://<user>:<pass>@<server>:8443/collector
set httpd port 2812 and use address <localip>
SSL enable pemfile /etc/ssl/certs/monit.pem
        allow localhost
        allow <my IP range>
        allow <local user>:<local pass>

set mailserver using hostname "<hostname>"
set mail-format { from: <my root email address> }
set alert <email for whoever gets bad news>

check process postfix with pidfile /var/spool/postfix/pid/
   group mail
   start program = "/etc/init.d/postfix start"
   stop program  = "/etc/init.d/postfix stop"
   if failed host port 25 type tcp then restart

check process ssh with pidfile /var/run/
   group system
   start program = "/etc/init.d/ssh start"
   stop program  = "/etc/init.d/ssh stop"
   if failed host port 22 type tcp then restart

check system <my host name>
    if loadavg (1min) > 4 then alert
    if loadavg (5min) > 2 then alert
    if cpu usage (user) > 80% then alert
    if cpu usage (system) > 80% then alert
    if cpu usage (wait) > 80% then alert

If you've just spent a week trying to make Nagios work, or invested in a dedicated server just get ZenOSS working, give this ago .. within 10 mins you'll be kicking yourself!

M/Monit can be found here ..