Sunday 9 March 2008

More monitoring with twitter

Or, the little twitterbot that could.

As I've mentioned before, we've got two identical nagios boxes running, one notifies us of problems via email, one via a special private twitter account that the systems team follow. So if email service or one of the nagios boxes goes down, we'll still get notified.

This is an improvement, and we're already getting to problems quicker. Great smashing super. But sometimes we trip over each other. I'll log in to fix something to find that P is already working on it. This hasn't bit us yet, but rest assured, if we don't deal with it, it will bite us one day. So.

The little protocol we're working with now is as follows: when you take on a problem, you IM the others that you're working on it. But that's n-1 messages before you start working on the fix. A pain and a waste of time.

So I'm working on a little bot. It watches the direct messages feed for the monitoring twitter account ( let's call it skaffen ), and when it gets a new direct message, sends it back as an update to the skaffen account, with the original sender prepended. Like this:

skaffen: WARNING -- stuff is borken
mawhin: d skaffen fixing stuff
... up to a minute, because of twitter rate limiting
skaffen: mawhin is fixing stuff

So to pick up a problem you direct message the monitor. I think that's sweet.

No comments: