Re-activated my performance monitoring with Rackspace

In my last blog post about my Rackspace monitoring solution I described why I deactivated the performance tracking: I measured the wire and not my blogs performance.

I was totally surprised that my post got an answer to that problem in three comments one and two days later. And those comments came from Rackspace employees. I never thought about contacting them about this. First of all, this was not on my high priority list and second, since I'm only using the cloud monitoring stuff for a dollar and a half per month, I never thought about bothering them with a request about that. I wanted to dig into it myself later, when time was right for it.

I knew that Rackspace is advertising with 'fanatical' support - but I never would have thought that they would scan random blogs for potential issues and support there. I am totally suprised and I think I already became a fan.

Now, the idea from Justin was to set the consistency level of the check from the default value QUORUM to ALL. In my case this changed the alerting behaviour from "Oh, two of the three zones are slow, let's warn him." to, "Well, both US zones are slow, but the EU zone London still performs good, let's wait with an alert until that gets slow too".

The other idea was to consecutiveCount to a larger value, so that an alert is only sent if the check fails for X consecutive times. I have this in mind, but for now the consistency level is totally enough for me to re-activate the check. So this is my new performance check I activated a day later:

:set consistencyLevel=ALL

if (metric['duration'] > 2500) {
  return CRITICAL, "HTTP request took more than 2.5 seconds, it took #{duration} milliseconds."
} 

if (metric['duration'] > 1800) { 
  return WARNING, "HTTP request took more than 1.8 seconds, it took #{duration} milliseconds."
}

return OK, "Overall performance is okay: #{duration} milliseconds."

With this in place I now get alerted whenever London thinks my Blog is slow, and it gives me response times from all three zones in that alerting mail. This check is just running for two weeks now, and alerted me two times. Both times the alert went green the check after, so the consecutive count would have prevented those mailings, but I think this 'spam' rate is still okay and I don't see the need to fix this.

Actually, I'm pretty happy about those mails once in a while, because I feel that my monitoring is in place and working.

So the bottom line is: Rackspace support really, really rocks. And the monitoring can be tweaked to meet my needs.

A small update to my Rackspace Cloud Monitoring configuration

In the last post I mentioned how I set up my Rackspace Cloud Monitoring system to notify me when my Blog fails or performs badly.
I tweaked the configuration a little bit now: I deactivated the performance check.

Why is that? Because it did not monitor my Blog's overall performance but just the quality of the transatlantic wires. And I can tell you: It's very volatile.

While having constant good response times from the check zone in London, both U.S. zones are okay, bad, then critical and then okay again in a matter of some minutes. So I was spamming myself with that check.

I'd love to have a redundant performance check in place, but there are currently no two check zones in Europe, and I did not find a way up to now to restrict the performance check to the London values only. I think I'll do some more research on that later. For now, I'm fine with the Code 200 'Up and running' check.

Technical problems, a solution and Rackspace cloud monitoring

Some of you may have noticed that my blog experienced some technical difficulties yesterday morning.

For some reason I couldn't find out the IIS still served static files, but anything that had to do with code like this Blog, my TeamCity, YouTrack, Stash and Fisheye applications did not respond anymore. The sad thing was that I even couldn't RDP into my VM, and so I had to trigger a reboot through the hosters web interface.

What I really disliked was that I noticed the problem only when I wanted to log into my blog to check for comments and spam.
To improve that I thought about monitoring my server or better the services it runs. So I asked Google to suggest some monitoring solutions that could help me out.

First hit

The first hit was Rackspace Cloud Monitoring. The price of 1.50 USD / month is great because I don't want to spend a lot for checking my private stuff, but everything at about 5€ / month would be okay for me. The feature set described on their homepage was okay for me. What I really need is some service that makes a request against my blog and checks if it returns a 200 status code, and alert me if this is not the case.

So I signed up for a Rackspace cloud account. After a few minutes I got called to verify my account and the guy on the other end of the line offered help for getting started with them. I really like this approach, because it really takes down the barriers.

My first and single difficulty

After I signed up and was activated I logged into the management portal and looked for the monitoring options. Guess what? Nothing there. Their homepage stated it should be easy to configure the monitoring through the portal, but I could not find an option.

I tweeted about that and almost immediately I got a response with a link to a getting started video. Honestly, this was the point where I was really impressed. The Rackspace community obviously is very strong and willing to help. That's great.

So, watching the video I learned that I could set up monitoring for a VM that I host on Rackspace, but if I delete that VM the monitoring setup would vanish too. Nothing for me, because I don't need a VM but just the monitoring.

After tweeting about that I got this very helpful response:

I didn't want to use the API, because I actually wanted to easily click together my simple 200-check. So I tried out this labs-GUI.

The setup

I didn't dig into the documentation before I started. Actually I thought it should be possible to figure out how to set up a simple HTTP monitoring by just clicking through it. The labs GUI is a very basic Twitter Bootstrap interface that just enables you to access the functionality. Right now there is no real UX, but that's okay. It works 😉

First I entered an 'Entity'. I thought this would be the thing to monitor, so I entered 'Gallifrey', the name of my server. Turned out I got it right. What I could do additionally is to install a monitoring agent on Gallifrey to have it send data about CPU, memory and disk usage to Rackspace that I could use for my monitoring too.

Entities

For this entity I now could add a 'Check'. I named it 'Blog' as I wanted to check the blog on Gallifrey.

Here I could configure that this is a HTTP check, the URL to test and from which locations Rackspace should test this. I checked London and two U.S. locations as 3 zones cost the same as just a single one.

Now, this check alone won't help me. I need to tell the system what to do after a check and what are the error and ok conditions: Enter 'alarms'.

Alarms are the actual thing I want: A mail, whenever something goes wrong. The alarm is fed with the information from the check, evaluates it by rules I enter (the something) and where to mail the information to.

I started with my status code check (see screenshot on the right).
Status code alert

For the check language I had to check the documentation, but the samples are very self-explanatory so that I had this check running in minutes.

I then added another step that should notify about the performance of my blog. For this I used this check:

if (metric['duration'] > 2500) {
  return CRITICAL, "HTTP request took more than 2.5 seconds, it took #{duration} milliseconds."
} 
if (metric['duration'] > 1800) { 
  return WARNING, "HTTP request took more than 1.8 seconds, it took #{duration} milliseconds."
}
return OK, "Overall performance is okay: #{duration} milliseconds."

The values may seem a bit high, but since two of the three check locations aren't in Europe I have to take some transatlantic latency into account. These values seem to work, because with lower values I already got quite some mails warning me that the performance seemed low 😉

Noticing the alerts

To be really notified I created an filter in my e-mail account that marks the mails with 'critical' or 'warning' status as important. This way I get notified directly because I don't let my phone notify me of every mail I receive.

Conclusion

Rackspace is very fast, easy to use and has a great community that helps you getting started in minutes.

With just about 15 to 20 minutes effort and a current investment of 1.50 USD / month I have a very easy to set up and hopefully reliable monitoring for my personal blog. This way I can react faster when something strange happens.

Disclaimer: I'm just a new customer of Rackspace and not related to them in any other way than that I'm paying them to monitor my blog.