Host And Website Monitoring

Requirements

The following were my requirements when I was looking for host and website monitoring solutions.

  • Monitoring individual websites
  • Observing a host system (i.e. web server)
  • Warning by email
  • Nice to have: sms alerts (or a bridge to an sms-gateway)

With regard to #2, observing a host system:

  • I want to be warned if system resources are nearing critical limits. E.g. the hard drive is at >90% capacity.
  • It would also be nice to have a tool which notices trends, e.g. the CPU load is has been increasing steadily over last month.

Solutions

Please note that the following tools do not fullfill the requirements mentioned above (far from it, in most cases), but they are important monitoring tools.

  • Webmin: go to Others > System and Server Status to add various monitors (remote systems, local services). Use the email option and an sms gateway (email2sms) to send out warnings.
  • collectd: an excellent, light weight data gathering tool with CPU and load monitoring, as well as plugins for servers such as Apache and MySQL.
  • drraw: a perl script which uses rrd files to generate graphs (collectd can maintain rrd files for you, e.g. to create a history of http requests)
  • WebminStats: installed under System as Historic System Statistics.
  • Munin: easy to install and configure. Does not force you to use snmp (uses a monitor and node, but both can be on same machine). Has a number of Apache plugins (including two for Passenger).

Unexplored:

  • zabbix (apparently more professional than Munin, and easier to set up and use than Nagios)

What NOT to Use

  • Cacti: bloatware. Requires you not only to install and configure snmp stuff, but all lot of fluff too (graph trees and whatnot).
  • Visage: Sinatra / Ruby based graph generator (using rrd files). Not advanced enough (yet): no obvious way to include min, max and average from an rrd file.
  • Nagios: overkill for my needs. Seems to be pretty complicated to configure (maybe even more so than Cacti).

Comparisons

Simple Monitoring Tools

  • top
  • htop

How to interpret the load average, as reported by top/htop: 1, 5, 15 min. averages. See this article: Understanding Linux CPU Load - when should you be worried?. The gist of it is that for 1 logical cpu (can be a core):

  • 0.70: need to look into it
  • 1.00: fix this now
  • 5.00: serious trouble (e.g. box is hanging)

Multiply these values with the number of logical cpus as reported by cat /proc/cpuinfo or lscpu. To get the relevant count, run it through grep and word count: grep 'model name' /proc/cpuinfo | wc -l.

Find The Cause of High Load Average

The article Linux Troubleshooting, Part I: High Load provides an excellent intro. The gist of it is that there are three main categories of high load:

  • CPU-bound load
  • Out of memory issues
  • I/O bound load

You are advised to look at the causes in that order, because sometimes Out of memory issues may look like I/O troubles (if the systems starts swapping heavily due to memory issues, it increases the I/O load as well).

TIP: use htop to easily see if the high load falls into either the CPU-bound or the memory-issues category. Unfortunately, it's not so easy to spot I/O bound load issues, especially not on a VPS.

You cannot use the iostat command on a VPS to discover I/O bound load issues.

Another useful article: Diagnosing Disk I/O issues: swapping, high IO wait, congestion.

Performance Testing with JMeter

JMeter is a Java program to load test your website. Basically, it measures two things:

  • throughput per minute (requests per minute handled)
  • average response time (in milliseconds)

Using a single instance of JMeter, you should be able to simulate a load of at least 100 concurrent users (where JMeter is running on a different machine than the webserver).

As a rule of thumb: “it is best to run it on your hardware so that the CPU of the PC does not peak at 100% - a stable 80%-90% is best otherwise the results are affected”.

Terminology

  • Threads: users
  • Samples: HTTP requests
  • Listeners: graphs & tables (reports)

Test Plans

For simple sites, it's easy to create your own test plan inside JMeter. For more complex sites, such as the LMS Moodle, you can use test script generators. In Moodle there's one available under Administration > Site Administration > Development > Make JMeter test plan (included from version 2.5 onwards).

Problem with CSV users data

One thing to look out for when using Moodle's JMeter test plan generator: it outputs a csv file for the test users. In JMeter, this csv file is referenced twice, once under 'Warm-up site' and again under 'Moodle Test', as CSV users data. I had to insert the complete path to this csv file before JMeter could properly access it.

Ignore Errors / Warnings about recorder.bsf

The Moodle test plan will also contain a reference to recorder.bsf. My version of JMeter was not able to locate that file. Just ignore the errors & warnings. JMeter will run just fine without recorder.bsf.


Personal Tools