Risking the business: 2008

Saturday, 26 April 2008

Facebook - bringing back the village shop

Leaving aside the 'serious' side of social networking for the moment - journalism, marketing, education, research and all that stuff - I've been thinking around what social networking is doing to/for people in general.

And it's replacing the village shop.

Now there's well publicised research and academic stuff which suggests that humans are evolved to live in relatively small groups, a couple of hundred individuals or so, and it's a pretty convincing story.

It's not a big jump to suggest that our mobile society, in which friends move away and become less easily accessible, creates significant psychological stress as we try to deal with people we are emotionally attached to but never see. I know it stresses me. Because every contact, rather than being an informal two minutes as our paths cross in a common environment, becomes an event, which has to be got right, and I don't know what's going on with them right now, and maybe I'll have a think and do it tomorrow.

And facebook (and bebo and myspace, I'm sure, but I don't go there) is bringing it back. I have some idea what they're up to, so long as they do the status update thing. I know that they have similar info about me. And they seem less distant. Facebook is the new village shopkeeper, who tells you that Norma was in last week, and one of her kids has had chicken pox, and you tell him that you've been busy with your new hobby...

Now I'm not suggesting social networking will fix everything, nor that it doesn't bring new problems. But it's here, it's staying, and it's a way to live our hectic diconnected modern lives that we're not really eveolved to cope with.

Or maybe I should lay off the barley wine for a bit.

Wednesday, 23 April 2008

MS Access - the best of apps, the worst of apps.

Microsoft Access. It's possibly the most painful application our users run. It does some fundamentally stupid things:

If your default printer is set to automatically select a paper tray, Access will fail with a deeply cryptic error when you try to preview a report. In fact, once you know what's going on, it sorta makes sense.

The problem is that if Access is going to print preview a report, it needs to know what size of paper it's going to print on, what the printable area is, and probably other stuff. And so it checks to see what the default printer is set up to use. And if the printer is set to select the tray dependent on the paper size requested, then Access can't find out. As I say, it sorta makes sense. But surely

'Your default printer is set to select an output tray automatically, and so Access doesn't know what size to print this report. Please select a paper size....'

wouldn't be that hard?

There's others, but that's the one that bit us AGAIN today. I'd happily advise my colleagues 'this MS Access? Yeah, it looks nice, but to be honest it's a piece of crap. Don't bother with it.'

But I can't. Why not? Because there's nothing else that does the same job. It's easy to build a moderately complex and sophisticated database application, with integrated reporting and stuff. And when I say easy, I mean easy for actual humans, who do actual jobs.

The truth is, for all that it's buggy as hell, and doesn't scale well, and uses perversely tweaked SQL syntax, and on and on, Access is the killer application, competing in a field of one.

There's nothing else out there which does 'desktop database application' well enough most of the time to compete. Show me I'm wrong.

Sunday, 13 April 2008

Selected photos from a trip to Glasgow

Blogged with the Flock Browser

If you can fake sincerity.. more IT Support philosophy stuff

I'd like you, my reader, to imagine yourself a user of IT Services. You're having trouble sending email to a particular institution, and there's a deadline approaching. So you pop into IT Services to see if anyone can help.

Scenario 1
There are three technicians sitting at their desks. You can see one of their screens, and he seems to be shopping. The technicians ignore you. You wait. Eventually one of them turns away from the screen. You explain your problem. The technician replies "We can't do anything about that, you'll have to put a job on the system, and the email team will have a look at it". You leave.

Scenario 2
There are three technicians sitting at their desks. They all look up as soon as you walk in, and one asks if they can help you. You explain the problem, and the technician replies "I'm sorry, that's something the email team will need to look at. I'll help you put a job on our job system now. Have you used the job system before? Don't worry, I'll show you what to do. Here's a job reference number. You 've got a deadline? Don't worry, I'll call one of the email team now and ask them to treat it as urgent". You leave.

Now I'll not insult your intelligence. Scenario 2 is the good one. Scenario 1 happens all too often. That's obvious, and it's not really the point.

The point is that in neither scenario did your problem get solved. The people you approached couldn't help you themselves. But in scenario 2, I'll wager you'd feel confident that someone would help, and probably in good time. And you'd be more likely to come back for help again. And less likely to moan about IT Support.

Moral:
Customer Service is a lot about serving the customer, and a lot about making the customer feel served. And it's easy and costs nothing to do the latter. And they'll love you for it.

Disclaimer: This is a hypothetical scenario, and in no way represents any situation that has actually happened, nor anyone who actually exists. Honest.

Friday, 11 April 2008

Post Networkshop

In my previous post, I expressed my intention to blog and twitter all the way through networkshop 36. I so failed. For several reasons, I think.

Connectivity problems: at least initally, I couldn't get network access. There were technical dificulties with the installation, but there was wireless available. It's just that my eeepc hasn't been set up to do 802.1x, and that's what was available. This was resolved by the beginning of the first full day, but by then I'd not started at the start. Just to be clear, there were initial technical difficulties, but I could have gotten around it. I didn't. Ah well.
What to say? When I was in a position to twitter live from the sessions, I sorta stalled, couldn't think of anything to say that would make any sense or give any value out of context.
More connectivity problems: I tried to make up for not twittering on Tuesday by writing up a blog post Tuesday night from the hotel. But the hotel's proxy server was broken.

Enough with the excuses. I took some pretty cryptic notes, and will be posting a distilled version of what I found interesting/insightful from those.

For now, three highlights for me were;

The network monitoring BOF.
(Birds Of a Feather session. I don't know why). There seemed to be some agreement that we, the delegates, could benefit from trying to be a bit more of a community outside of networkshop, and share techniques, scripts etc related to monitoring and the configuration of monitoring systems.

There's a mailing list (I don't know if I'm supposed to advertise it's address, so I won't), and there was talk of a wiki. There seemed general agreement, so we'll have to see if we can make it happen. I'll give it a run, hopefully I won't be billy-no-mates.

One thing that strikes me often, and struck me during networkshop several times, is the apparent expectation amongst the academic networking community that everyone runs cisco kit. And everyone's happy to build systems that rely on cisco specific services. Which doesn't help me very much. And doesn't it sorta help cisco perpetuate lock-in? Perhaps I'm just feeling left out.

Alan de Kok.
The lead developer on the freeradius project, and by far the slickest speaker I saw. Sorry guys. More on this one later. Suffice to say that by the time he'd finished, I was grinning like a fool.

Celtic Music Radio.
Every year, there's a couple of sessions with a less narrow focus. This was one. There was a bit of networking involved, not much. But the speakers described building systems, with little money, and plenty of problems to work around. And a boat ferrying sewage and pensioners was involved.

And if you noticed me drifting off while you were speaking, I'm so very sorry. I'm past 40 now, and the afternoon nap is becoming an inevitability. At least, that's my excuse.

Monday, 7 April 2008

Getting ready for #nws08

Tomorrow morning at stupid o'clock, I'll be leaving sunny Birmingham, and will be off to Glasgow, to attend Networkshop 36.

I've attended several times now, and each time it's been way too late before I've started to prepare, each time there's not seemed enough time whilst I've been there (perhaps earlier bedtimes might help there), and each time I've come home tired, inspired, and with suprisingly little in the way of details remembered.

This year it's gonna be different. I have a plan. I'm going to try to document what I gain. I'll try blogging here, and I'll try twittering.

#nws08

It's my intention to tag all my posts #nws08 (I originally considered #networkshop2008, but it's a little long for twitter's 140 char limit, and it's a pain to type on a phone). I'd encourage anyone attending who's blogging or twittering (or anything else-ing) to do likewise.

Hopefully, my blogging/micro-blogging at networkshop will

help clarify any thoughts I have (unlikely, I know) during the conference, and
give me something semi-coherent to show for my attendance after the fact. Y'know, to justify the expense of sending me to Glasgow. Although I get the impression that my employers consider the few hundred pounds a small price to pay to get me three hundred miles away.

Now it's time to go get the suit out from the dog's basket.

Sunday, 6 April 2008

Emergent game on twitter

I've been approached by a local artist working on emergent game «Σ». I've built another bot, which uses the twitter api to search for twitterers located in the emergent game, then uses the XMPP interface to make a private account follow them.

Then we chuck the 'with friends' feed through a yahoo pipe, for two reasons.

I'm pretty sure twitter needs you to be authenticated to see the 'with friends' feed. Yahoo pipes has a 'private string' object, so we can (safely?) embed a username and password in the feed url.

If (when) there's a parsing error in the bot, the yahoo pipe can filter the private accounts own updates, so we effectively get a 'just friends' feed.

Which gets displayed on the game blog thus.

Take your hands AWAY from the keyboard..

Just a tiny post to mention a personal rule.

take your hands AWAY from the keyboard..

whenever I'm doing something potentially risky ( you know what I mean, like running chmod whilst logged in as root ), I type the command, and lift both my hands theatrically up to my shoulders while I read what I typed and decide if it's gonna cause me pain.

It's saved me innumerable times.

My working environment

It's not a good title. But I can't think of a better one right now. Here's a first hack at a diagram of my day-to-day comms setup. It's only a start, and doesn't cover a bunch of stuff. I'll add to it soon.

The green arrows represent places I input. The blue ones, places I (or others, more on that later) view stuff. Most data paths are labelled with the protocol or protocol families used.

Thursday, 3 April 2008

I'm very proud

to announce that IPv6 has affected me, in actual real life.

We run a moodle server. It's pretty important. We run it in a DMZ, and it's protected by a commercial firewall product. It's running on Ubuntu Gutsy server, and we spent Monday and Tuesday upgrading all 30-odd instances from moodle 1.72 to moodle 1.9 . This went very well and proved very easy.

We also did a whole bunch of ubuntu package upgrades, which cowardice had caused me to shy away from till now. I mean, how comfortable would you be if aptitude upgrade told you the kernel would be removed?

Aaanyway, we did it, and it all worked. Except now, it was dog slow. Like 40 seconds to return a page. So, off I went on the now familiar hunt for the wotdidIdowrongthistime bird.

I'd noticed a long pause at the start of every aptitude download, and there was the clue. I ran a sniffer on the moodle box, and it's name server. And there was the problem.

Each time the moodle box needed name service, it would send four DNS requests for AAAA records, which the nameserver just never saw. Then the moodle box gave up waiting, and tried for an A record, which the nameserver saw and responded to, and on we go.

Turns out that our firewall didn't want to pass AAAA requests, or the ubuntu box was sending them up it's own bum or somewhere else sub-optimal. After a few minutes googling and wincing at the ubuntu forums, the answer turned out to be this:

in /etc/modprobe.d/, create bad_file with the line alias net-pf-10 off.

And all is happy again. S'pose I ought to be talking to firewall vendors soon...

Wednesday, 2 April 2008

Going google - progress report

After some months in internal debate (internal to me, that is), I've decided to go google. And last weekend I shifted 6 year's worth of stored email up to gmail.

And I have to say, it's looking good. It's quicker than IMAP (certainly when dealing with big chunks of messages), and the spam handling is at least an order of magnitude better than mine. And the interface is pretty nice.

We don't need no goddam folders.

Previously, I'd been using a middling complex procmail setup to auto-file everything I wanted to keep into auto-named folders, with the folder names generated from the from: address, and all actual mail folders grouped in folders according to initial letter. So all mails from bill.gates@microsoft.com were filed in the IMAP folder /b/bill-gates . This made it considerably easier to find old mail when I wanted it, without relying on me to manually file it correctly (which would have been a non-starter).

So when I transferred to gmail, I kept the folders as labels. Big mistake. Because searching 12,000 messages via gmail is so quick, there's no need for filing. So I spent two hours removing labels from everything. Big win. Preceded by a coupla hours of big loss.

Instant Messaging.

I realised on Tuesday, once I was settling down to actually using the thing, that if you're using the chat widget which lives in gmail, the conversations are archived in with your email. Which is expletively neat. What's rubbish is that the chat I had with a person who clicked on the 'chat with' button on the blog wasn't automatically archived. 'Cos actually I needed to keep that. Happily, I'd not closed the window yet.

Now part of the reason I'm doing this is about changing the way I work. I've had enough of relying on installed applications for run-of-the-mill stuff, and the work that goes into managing that.

The other part is that I need to be better informed as to how well this can work. I'm seriously swayed by suggestions that institutional IT support should let the likes of google focus on providing commodity applications like email, word processing and document sharing, and we focus on providing better and better access to those services. 'Cos google are just gonna do it better.

It's important to me that I point out here that I'm not outlining my employer's policy, nor speaking for my employer. These are my own versions of other's thoughts. Not my employer's. Is that clear enough?

Tuesday, 1 April 2008

motorway junction geocoder

It's a yahoo pipe, which for some major subset of UK motorways, takes the motorway and junction numbers, and returns a geo thing.

This came out of an idea I've mentioned here, and which Ive since discovered is the basis of this product.

But while I was playing around, I realised that none of the big mapping providers do this - or at least I couldn't get them to. Perhaps you'll do better than me, "M6 Junction 6" always landed me in Manchester. Not where I wanted to be.

So I searched, and searched, and found a couple of GPS waypoint files listing UK motoroway junctions. None of them in any kind of useful format. Still, where there's a will...

So out comes gpsbabel, and a few 'tests' later, I've got a GPX file, which works. The rest of the pipe is about getting the XML structure to work.

It doesn't do much, but as far as I know it's the only one of it's kind, and it does do what it's supposed to.

I'm pretty pleased with meself, me.

Saturday, 29 March 2008

going google part 2 - imapsync

I've already mentioned that I'm going google.

The largest part of this is transferring my mail from my IMAP server up to gmail. That kept me up until 5 this morning, and is about half done. And let me tell you, it's been a pain.

I started in Thunderbird. I knew it would take some time, but I'd got all weekend, shift a thousand at a time, done before you know it. Nuh-uh. Thunderbird would only shift a couple of hundred at a time. Any more, and thunderbird grinds to a halt.

So I did a little googling, and found imapsync, which as I write this is doing the job. It's a perl program, with lots of command-line switches. Here's mine (sanitised)..


imapsync --host1 10.1.2.3 --user1 myusername --password1 xxxxxx \
--prefix1 /home/sites/site1/users/myusername/imapmail/ --authmech1 LOGIN \
--host2 imap.gmail.com --user2 my.googleid --password2 xxxxxx \
-ssl2 --skipsize --noauthmd5 --subscribed --delete --expunge \
--exclude SPAM|HAM|Drafts --split1 200 --split2 200 \
--nofoldersizes --regextrans2 s/[a-z]\/// --fast --syncinternaldates

--prefix1 /home/sites/site1/users/myusername/imapmail/ is to strip off the full path to my IMAP folder tree.
--regextrans2 s/[a-z]\///is because I've got my IMAP organised like a/alan, a/andy, d/dave, etc. It whips the initial letter and forward slash out of the gmail label.

The command line's a bit gnarly, but it's getting the job done. So big up imapsync.

Friday, 28 March 2008

networkshop 2008 blog aggregator goes live!

JaNET hold an event every year, called networkshop, and it's a three day gathering of networking (fibre and routers, not facebook and twitter) professionals from UK academe.

We get to talk dirty tech, and swap stories, and socialise with our peers. And betters. Mostly betters.

This year, JaNET training are having a crack at supporting the conference with what I'm interpreting as a VLE sort of approach. It's currently private to networkshop attendees, and I'm sure there's plenty of room to debate whether that level of privacy is needed. We're a pretty paranoid bunch. Or 'reasonably careful'.

Not the point though, The point is, they've built a blog aggregator in yahoo pipes. So for anyone who doubted it, I present this as evidence. JaNET training are getting with the program. They're drinking the kool-aid.

going google

I'm gonna take the red pill. As of now, I'm shifting 6 years worth of email from my cobalt raq in the back bedroom up to gmail. I've got to the point where I trust google's ability to look after my mail better than I trust my own.

What's prompted me is the eeepc. It's got a tiny little solid state disk (I know it's not actually a disk, so sue me), and thunderbird's indexes would take a good chunk of that. So if I want to access all my email from the eeepc, gmail's the only way to fly.

We've also started looking at google docs, and I've been using google calendar for a while. Not to mention google talk.

So it's time to go the whole hog. Wish me luck.

eeepc!

Delivered yesterday, by a nice man from DHL, a spanky new little eeepc. First impressions so far.

This is, without a doubt, the most integrated Linux box I've ever used. I've opened it up, gone through the little quickstart guide, and was up a running in five minutes. Startup time is good, less than 30 seconds from power up to being able to use it. Nice.

Nothing to write home about so far. I've only browsed and run IM (for twitter, of course) yet, (I've run aptitude update and aptitude upgrade, of course).

It's a pretty happy circumstance when linux on a laptop just works without any, yknow, WORK. In my experience, at least.

Plans:

See about getting it to use the blackberry as a modem.
See if I can get it running as a VPN client. This will make the difference between a nice little toy and a machine I can realistically use for work.

Anyway, it's a sweet little thing.

On another note entirely - me mother was round most of yesterday, and by the time she left I had no kitchen sink. There's a big hole where it used to be. H and Tornado Boy are coming home tomorrow afternoon. The man at the builder's merchant has apparently said it will arrive today. So, assuming all goes to plan, we'll have a new kitchen installed by the time they get home. D'you see the potential problem there?

Wednesday, 26 March 2008

Twitter as platform

Here I really want to just jot thoughts.

What excites me about twitter, and yahoo pipes, and jabber, and the way this stuff can be wrangled into working together, is that it starts to bring the network to life. In a science-fiction kinda way.

One of sci-fi's signature technologies is the 'personal terminal'. It's a small thing, with perhaps some intelligence, but it's key feature is a permanent, on-demand connection to 'the network'. Iain M Banks in his culture world has terminals as a part of the hyper-powerful post-AI thang. Your terminal knows where you are, and you can contact society through it, both machine and organic.

Wonderful sci-fi stuff. IMHO. But the truth is it's not far off. I've a phone with GPS, it runs Java applications and can communicate over jabber with 'the network'. I can listen to my online friends public voice. I can, in theory at least, access all sorts of user generated content from my phone, based on my location, which my phone knows.

Next on the list is an exploration of location-based status reporting - the example is traffic conditions.

I'm stuck on the motorway - I twitter '@uktrafbot M5J3 Southbound stopped.' Or better, software running on my phone does.
You're planning a journey. You're route planning software checks along the route and looks for (a) recent traffic twits along the route if you're leaving now, or (b) if you're leaving tomorrow morning it looks for patterns of twits during the time you'll be travelling. And lets you know if you're route might run into problems.
You're travelling. You're heading along the M5. Your phone knows where you are, and periodically checks for recent twits along your route, alerting you to problems ahead.

I'm looking for a collaborator, with mobile java dev skills.

brummie twits part 3 - mawhin_bot1 settles down.

OK. That'll do for now. @mawhin_bot1 is pretty well complete in it's first incarnation.

I've posted before, here and here, but to save the clicking I'll describe it again.

It's a twitter account, inspired by @peteashton and @BhamPostJoanna, which follows people who claim their location to be in the West Midlands.

It sits and watches some of the public twitter feed, and when it spots a tweet from a midlander, it tweets about it, and starts following the twitterer.

A couple of issues have come up.

@kevin_rapley raised the issue I had feared, that not everyone located in Birmingham claims brummiehood. Some are, sad to say, even offended. And we'd best not even mention the Black Country. So announcements are are lot more cagey these days.
Do you want to be followed like this? No? Block @mawhin_bot1. It won't try again. Unless it's broken, in which case, let me know and I'll (1) fix it, and (2) hard-code so it doesn't ever follow you again.

So far, mawhin_bot1 has collected 47 twitterers, ( actually 46 plus @bbcfooty, who also ain't real ). There's a bunch of background work still going on, and I'm hoping to extend this to handle several other cities/conurbations.

On the subject of locality, I think, I suppose, at the public transport level. If you're within a city's regular public transport network, that's a level of locality that's interesting. I think.

I'm quite excited about twitter as an application delivery platform. Especially with location thrown into the mix. Onward and upward!

Sunday, 23 March 2008

Brummie Twits part two

Well, that was a start. But a bit broke and it doesn't seem to update in a bloglines feed or anywhere.

On to mark 2. This time it's a twitterbot, name of @mawhin_bot1, whom I have chained to the computer and is searching for brummies. ( If by brummies you're prepared to accept anyone in the west midlands conurbation. Call us what you will ).

It's doing a couple of things I thought were interesting.

It's using XMPP. I can get a pretty decent feed of 'tracked' terms over XMPP, something I just can't do using twitter's REST API.
It's using the twittervision API for determining location.

A coupla things to do:

Add support for @dangerday, a twitter version of fireeagle.
perhaps add support for using 'whois' over IM. But that's gonna be harder. And I don't know that it will add any further data that twittervision.
Build reflection, so @mawhin_bot1 reflects everything said by those it follows. Or possibly this could be another twitter user. Thoughts on this would be welcome.

Friday, 21 March 2008

Brummie Twits

@peteashton and @BhamPostJoanna wondered " Is it possible to get a feed for twitters from a specific city?".

@aeioux pointed out http://twittermap.com

Twittermap, while it relies on twitterers twittering their location, gives results according to where twitterers are / were recently.

This yahoo pipe instead searches a twitterers' friends (any twitterer whose friends you can view) for the location in their profile. So the result is not 'live', but reflects where the friends are based.

Here's a first hack using yahoo pipes. The page parameter is for when there's more than 100 friends to deal with.

Sunday, 16 March 2008

IT Support in a large environment.

I wanted to jot down a few guiding principles, that have worked (I think) well for me. They work in the environment I'm in (Large FE college), very few 'special' users.

Provide simple services. The question's not "what'll it do?", but "how well can I support it?". If it can do everything brilliantly, but it's broken often enough that it's not used, you lost. There are exceptions. Keep them limited.
Monitor everything that matters up the wazoo. Don't wait for your users to tell you it's broke. They won't. They'd rather carve it on a siberian rock than tell you about it. And by the time someone does overcome their disdain for you and let you know there's a problem, it's been going on for ages, and you lost. There are exceptions. They are your friends.
Recognise, and hammer into your colleagues when they forget, that your users, by and large, don't care about computers. They're not interested. They use the computer 'cos they've been told to. For work at least. And all they want to do is their job. And this is as it should be. When you wash your face, should you care about the details of water supply? No. All you want is water that's not brown. If you expect users to be interested in anything outside their job, you lost.
Automate everything that's feasible. Once a script is right, it's right, and it will be tomorrow. If you're relying on users or tech support people to do task J the same today as they did yesterday, you lost.
Document your recovery procedures. The last time you want to be trying to think is when it's all gone to hell and the phone's on fire. That's when you want to be following instructions blindly. If you're having to think what to do under pressure, you lost. I lose pretty often here. But it's worth trying.

Using nagios to monitor print configuration

Now this isn't a copy a previous post, which was about monitoring print queues. This is about monitoring our quite complex printing configuration system.

Again, a bit of background. We keep as much configuration info as we can in an LDAP directory. And we install printers on our windows desktops ( about 1,200 spread over several sites ) during either the startup scripts (for locally attached printer) or the login script (for network printers). We use the concept of a 'nearest printer', assigned to the workstation dependent on physical location.

So, sticking to network printers, we have:

A CUPS queue per printer, served by Samba to windows desktops.
A 'nearest printer' attribute in LDAP, attached to the workstation entry. This 'points to'..
A printer entry per printer, with an 'installcommand' attribute, which gives an appropriate command line to install that printer on a workstation. This gets run during the login script.

Perhaps it would be clearer to describe how a workstation gets a printer. I'm focusing on network printing, so this happens in the login script.

Look up my (the workstation's) LDAP entry.
From my LDAP entry, get the nearest printers.
For each nearestprinter, get the installcommand from the printer's LDAP entry.
And run it.

Now we like this. We can have several printers associated with one workstation, and we could (though we don't) associate nearestprinters with user accounts as well as workstations. It's real easy to change a workstation's printer(s), and users need know nothing about it. When it works, it just works.

But there's no referential integrity going on here. We can have orphans anywhere. A CUPS queue with no install commands. An installcommand referring to a non-existent CUPS queue. A workstation with a nearestprinter that doesn't exist. etc. Basically, config rot, caused by human failure to attend to detail.

And here we get to the point of it all. We have a nagios check called 'check_print_config', which checks all this, and creates a warning state if something's out of whack. It's posted below. As with most code posted here, it's finished to the point where it works. It's not great code. It does, I'd posit, do something interesting.


#!/usr/bin/perl -w
use strict;

my @nagiosCupsQueues;
my @nearestPrinterQueues;
my @installCommandQueues;
my @output = ();

#       print "Getting monitored queues from nagios...\n";
       @nagiosCupsQueues = (`grep check_cups_queue /etc/nagios2/conf.d/allPrintQueues.cfg | cut -f2 -d'!'`);
       chop @nagiosCupsQueues;

#       print "Getting installCommand queues from LDAP...\n";
       @installCommandQueues = (`ldapsearch -LLL -x -b "ou=hosts,dc=example,dc=com" '(installcommand=*con2prt*)' installcommand | grep '/cd ' | grep -iv idcard | grep -iv tmu220 | grep -iv null | sort | uniq | cut -f4 -d" " | cut -f4
-d"\\\\"`);
       chop @installCommandQueues;

#       print "Getting nearestprinter queues from LDAP...\n";
       @nearestPrinterQueues = (`ldapsearch -LLL -x -b "ou=hosts,dc=example,dc=com" '(&(objectclass=computer)(nearestprinter=*))' nearestprinter | grep nearestprinter | grep -iv lpt | grep -iv archicad | grep -iv tmu220 | grep -iv idcard | grep -iv null | sort | uniq | cut -f1 -d"," | cut -f2 -d"="`);
       chop @nearestPrinterQueues;

foreach my $icq ( @installCommandQueues ) {
   next if $icq =~ /^$/;
   push(@output, "ICQ: $icq ") if ! grep(/^$icq$/i, @nagiosCupsQueues);
}
foreach my $npq ( @nearestPrinterQueues ) {
   next if $npq =~ /^$/;
   my $npqNotInCups = 0;
   my $npqNoInstallCommand = 0;
   $npqNotInCups = 1 if ! grep(/^$npq$/i, @nagiosCupsQueues);
   $npqNoInstallCommand = 1 if ! grep(/^$npq$/i, @installCommandQueues);
#    push(@output, "NPQ:$npq:") if ( $npqNoInstallCommand && $npqNotInCups );
   my @duffClients = `ldapsearch -LLL -x -b "ou=hosts,dc=example,dc=com" "nearestprinter=cn=$npq,ou=hosts,dc=example,dc=com" dn | grep dn: | cut -f1 -d"," | cut -f2 -d"="`;
   chop @duffClients;
   push(@output, "NPQ:$npq: " . join(",", @duffClients) . " ") if ( $npqNoInstallCommand && $npqNotInCups );
}

#print Dumper([ \@output, ]);
if ( @output > 0 ) {
   print "WARNING: " . join(" ",@output) . "\n";
   exit 1;
} else {
   print "OK\n";
   exit 0;
}

Friday, 14 March 2008

Automating nagios configurations.

At the last count, we run something like 140 print queues, and as offices move, and printers get replaced, and 'stuff changes', queues are created and deleted and renamed. This post is about how I've addressed ensuring that nagios is monitoring all our queues, and minimising the opportunity for operator error.

A little background. We use CUPS to queue print jobs, and our technicians are free to create and delete queues as need be. They do not have access to the nagios configs.

So, the basic idea is that we periodically run a script on the nagios server that:

Queries each of our print servers for a list of existing queues
Creates a nagios config file for all print queues in the list
signals nagios to restart, and re-read it's configuration

So we get a monitoring configuration that doesn't miss print queues out, nor alarms about print queues that no longer exist. And no-one has to remember.

Which is nice.

So, ( and I apologise in advance for the code. I'm a sysadmin. Whaddya expect. ). The following is a perl script called from cron, once for each CUPS server. We pass the server address, and a human-readable site name, and we get nagios code out on stdout, which is piped into the appropriate nagios config directory. It depends on lpstat, which queries the CUPS server.


#!/usr/bin/perl

$cupsServer = $ARGV[0];
$site = $ARGV[1];

@queues = `lpstat -h $cupsServer -p | grep printer | grep -iv "sent" | grep -iv "off-line" | grep -iv "unable" | grep -iv "attempt" | cut -f2 -d" "`;
chop @queues;

foreach $queue ( @queues ) {
   print "define service{\n";
   print "\tuse                        generic-service\n";
   print "\thost_name                  $cupsServer\n";
   print "\tservice_description        CUPS_" . $queue . "\n";
   print "\tservicegroups              " . $site . "PrintQueues\n";
   print "\tcontact_groups             " . $site . "-printer-admins\n";
   print "\tcheck_command              check_cups_queue!" . $queue . "\n";
   print "\tregister                   1\n}\n\n";

   print "define serviceextinfo{\n";
   print "        host_name       " . $cupsServer . "\n";
   print "        service_description     CUPS_" . $queue . "\n";
   print "        notes_url       http://wiki.example.com/wiki/index.php?title=Nagios/" . $queue . "&action=edit&preload=Nagios/NewServiceTemplate\n";
   print "        action_url      http://" . $cupsServer . ".example.com:631/printers/" . $queue . "\n";
   print "        icon_image      HPlj4550p.gif\n}\n\n";
}

Coupla notes - the nagios action_url shows a clickable icon taking the user to the CUPS queue in question. The notes_url points to a wiki page. We use this to keep notes about the service.

This is all very well, but nagios won't pick up the changes without a restart. So once cron has built the config file, it does this:


export now=$( /bin/date "+\%s" ); #get the current time into a format nagios understands
export commandfile='/var/lib/nagios2/rw/nagios.cmd'; #identify the file nagios reads for external commands
/usr/bin/printf "[\%lu] RESTART_PROGRAM\n" $(( now + 30 )) > $commandfile #tell nagios to restart in 30 seconds

And Bob's yer uncle. Monitoring our CUPS queues with nagios means we become aware of problems quicker, and respond quicker. And automating the config makes this practical.

Sunday, 9 March 2008

What's your guiding question?

I know mine's changing again, and just for once, I'm aware of it happening.

Maybe (almost certainly) it's been put better elsewhere, but I made this up all by my own self. Your guiding question is the one you always ask. The one you measure everything you do against. The first question. The last question.

I'll try to explain.

My official job title is 'network and systems manager'. In practice, I'm a significant chunk of a team who make all the technical calls, and do all the fixing. We're generalists, who specialise in whatever's the problem right now. Not an unusual situation.

The pertinent part is where we 'make all the technical calls'. And technical decisions aren't always simple. Interesting technical decisions always aren't simple. Security versus ease of use. Customisability versus maintainability. Everything versus budgets. And the technical decisions I make and influence are affected, I hope strongly, by my guiding question.

The guiding question, for me, has evolved.

What do I need to do to make this machine work?
What do I need to do to make this service work?
What do I need to do to make this service work well for my users?
What do I need to do to make this set of services work together well for my users?
How should these services work together to best support what my users are doing?

And today, it's

What service infrastructure should I be providing and supporting to equip my users to do what they do, but better?

What's your guiding question?

I moved from bloglines for the comments

and then I forgot to turn comments on. Whoops. Sorted now.

More monitoring with twitter

Or, the little twitterbot that could.

As I've mentioned before, we've got two identical nagios boxes running, one notifies us of problems via email, one via a special private twitter account that the systems team follow. So if email service or one of the nagios boxes goes down, we'll still get notified.

This is an improvement, and we're already getting to problems quicker. Great smashing super. But sometimes we trip over each other. I'll log in to fix something to find that P is already working on it. This hasn't bit us yet, but rest assured, if we don't deal with it, it will bite us one day. So.

The little protocol we're working with now is as follows: when you take on a problem, you IM the others that you're working on it. But that's n-1 messages before you start working on the fix. A pain and a waste of time.

So I'm working on a little bot. It watches the direct messages feed for the monitoring twitter account ( let's call it skaffen ), and when it gets a new direct message, sends it back as an update to the skaffen account, with the original sender prepended. Like this:

skaffen: WARNING -- stuff is borken
mawhin: d skaffen fixing stuff
... up to a minute, because of twitter rate limiting
skaffen: mawhin is fixing stuff

So to pick up a problem you direct message the monitor. I think that's sweet.

Thursday, 6 March 2008

Planning for networkshop 36

I and a colleague will be attending #networkshop2008 (UKERNA run conference for UK academic network folks).

I'm pretty excited about this one, as most every session is of interest. And already there's a clash. The first set of sessions, we two have three choices - Voice Services, Network Security or Network Engineering - all of which promise to be interesting, and more to the point, relevant to stuff we are doing or intend to be doing or need to be doing.

What to do?

Well, the presentations are usually available online after the event. Sometimes before. I'd like to think that UKERNA will be videoing the sessions and putting them up somewhere. I'm hoping there'll be a number of delegates blogging. I intend to be twittering/blogging. I wonder what else would help?

fireeagle

Got a #fireeagle invite off of twitter. This is cool.

It's a Yahoo location service/framework/thingy. Yahoo holds a 'location hierarchy', and you allow applications to see/change some level of that hierarchy.

So you might want one application to know where you are down to the city level ( restaurant recommendations, for instance ), and another to know your exact location. Like perhaps http://rescuemefromthemadkidnapper.com.

Sticking with my current twitter fetish, http://twitter.com/dangerday is a bot which lets you update fireeagle with your location and to query user's locations.

So on twitter you can find out where I was last prepared to admit to being with 'd dangerday q mawhin'.

Deeply cool.

Twitter use case

Colleges / tutors should have a twitter feed that students subcribe to.

So, students could get: last-minute timetable changes, reminders of assignments due, event announcements, freebies (only via twitter, to encourage usage).

And of course, it works the other way too. So if a student is working, and needs to ask a question, twitter it. Anyone following (the rest of the class, the tutor, possibly other tutors with expertise) can answer, and everybody gets the benefit.

I'm sure this has been done to death in HE, but it ain't about fashion, is it? It's about what helps.

Monday, 3 March 2008

web 2.0 agogo!!

I've not 'got' this web 2.0 stuff up 'till now, and have sorta gone along with the slashdot 'yeah yeah, get a life, nothing new, whatever' approach.

Until this weekend.

Now I'm beginning to get with the program.

Google Talk on my blackberry ( with unlimited business data plan, and that's the kicker, I suppose ) is lovely. 'Cos it's one way of linking up to twitter.

And through twitter I get - system status notifications from my network and service monitoring systems - and it's a separate notification channel to email.

And through twitter I get - iwantsandy.com.

And my colleagues and partner can see my personal calendar 'cos iwantsandy publishes an ical feed that google calendar can understand.

And I've recently started using blackberrytracker.com, and with yahoo pipes' help I'm thinking I'll be able to geocode my twitters, retrospectively, using the REST API that blackberrytracker provides.

It all relies on not caring about the cost of mobile data.

But given that, it all hangs together. I can finally get close to running my shizzle entirely from my phone, and that not be crap.

GPS Timesheet idea

Now I imagine I'm your average tech geek, in that I'd much rather spend my time developing 'cool stuff' than drudge work, like filling in my timesheets.

In fact, so much so that I'm regularly in schtuck with my boss over it. He's werry understanding, but still...

So, I've got use of a blackberry 8820 with GPS, and a couple of applications that will produce tracklogs. And I can relatively easily send them to my PC via bluetooth. And I know where I work.

So, how about this:

keep the tracklogger on permanently.
upload to my PC whenever I remember.
persuade my PC to upload this to a webapp which..

knows where I work.
Will output a table (CSV export?) of when I arrived at work and when I left. Or at least a radius around where I work.

Now I think I can do this with what's available now. And I'm gonna have a go.

What would be better is the blackberry app that can twitter my current location every n minutes. So I build the whereaminow facebook app, etc, etc.

UPDATE Someones already done it, sorta. It's called blackberrytracker, but I'm stuck waiting for the registration email.

UPDATE Works great. Ish. Stay indoors too long, and the GPS fix starts to drift. So my three days at home sick looked a bit more like a busy day for a drug dealer. Hmmm. There's something called Kallman something that should help.

Friday, 29 February 2008

Nagios and Twitter

Following on from this post, I've got a second nagios server set up now, monitoring all the same stuff.
It's running in a VM on hardware connected to a different UPS, so that's one weakness mitigated.The other improvement is using twitter as a notification channel, as opposed to mail on the primary monitor. So if our mail service goes down, we'll know about it.
We weren't finding out before, cos the monitor was mailing us, but the mail wasn't getting through.
Reminds me of what my partner pointed out ( and that I'd not considered ) when I outlined VoIP. "But doesn't that mean that when the network is down, no-one will be able to call you to complain?". Every cloud's got a silver lining.

Monday, 25 February 2008

The Five Whys

A little while ago I read this, and was very interested in the application of the five whys.

So here goes nothing...

This morning, we were faced with a number of servers down, at our main site. It took until 10:25 to get everything important back up and running. We know yesterday that there was some problem, but our monitoring platform was one of the downed machines, so our view of the problem was somewhat clouded.

Two issues; first,

why were the servers down?

Why? The UPS supplying the servers in question had run it's batteries flat.

Why? There has been a planned power outage on the site, and the cutout on the supply to the offending UPS had tripped, so no power in to UPS, battery go flat.

Why? and at this point I am stuck, and need to talk to our power people.

But I feel I ought to continue...

Why? The servers in question either don't have redundant power supplies, or they're not connected.

Why? Cos we've not surveyed what we've got and planned our power connections. yet.

And here we have a plan, both to lessen our exposure to this risk in the short term, and to better understand how to avoid it in the longer term.

Why did monitoring fail;

Why? cos the monitoring server was on the UPS that died.

Why? cos we've only got one monitoring server, and we have to put it somewhere.

Why? cos whenever I've considered duplicating the monitoring server I've rejected it because I don't want to double the number of messages we get, and hadn't really thought through how to avoid this. And because I've not been able to spare the hardware.

I suppose the question is what do we do to make this more robust? I'm gonna duplicate our nagios installation, but mess with notification in two key ways - it will notify using jabber instead of mail, and it will only notify if the main monitor server is down. I'll put it on a VM at the same site for now, and will migrate it off site when we've a vmware platform to move it to.

Brainwaves, man

Popped into town to meet up with H & C at the Symphony Hall, mostly over by the time I arrived, C was sat with electrodes on his head, trying to do a rubik's cube, and watching a visualisation of his EEG. Running on a Mac, looked very cool.

Turns out, it's an EEG over bluetooth package, about 1500 dollars. But, as things do, it got me looking.

OpenEEG looks very cool.

And gnaural. Basically lets you build your own brain control music. Woohooh.

Saturday, 23 February 2008

Not Friday gone but the Friday before, we had a job to do. For reasons I won't go into, we needed to replace the chassis on one of our core switches. It took several hours, 'cos we took the opportunity to tidy the cabling. Now it looks like this.

We replaced pretty much every cable, and now apart from a few that are known to be temporary and are visually very obvious, they're all 'patchsee' cables. (It's really rather neat. They run a couple of strands of optical fibre through the cable see, and shining a special little torch on one end makes the other end light up. V. Handy).

This took 6 hours, all told (including lunch and fag breaks), and we're pretty happy with it.

I'm gonna try to remember to take monthly pics, and watch the entropy.

Sunday, 3 February 2008

Roll your own VoIP Analysis, it's not that hard.

In the previous post, we were trying to debug a problem with our phones. Now, we're in education, and money's tight. IT systems purchasing goes like this:

Technical, management and sales agree on a service and a price. In this instance, it was a fully managed VoIP service, with training for our people, full redundancy, call reporting, the works.

Technical staff go back to work, leaving management and sales to iron out final details.

We end up with some boxes installed, no redundancy, no reporting or training, and call-center monkey support. 'Have you got QOS?'

So we sure as s**t can't afford call quality analysis software.

I'm rolling me own.

The basic Mitel system uses proprietary MiNET protocols to control stuff, but plain G711 over RTP for the audio. Wireshark can split this out and save audio streams, as well as doing jitter/latency analysis, but I needed something less manual.

tshark doesn't do this. But it's possible to script something similar.

######################## analyseCalls.sh ########################
#!/bin/sh
# tpcdump file as argument

## first, identify distinct RTP streams in input
for ff in $(tshark -r $1 "udp.port == 9000" -d "udp.port == 9000,rtp" -T fields -e "rtp.ssrc" 2>/dev/null | sort | uniq | cut -f1); do
## count the number of 'quiet' packets - this is, ahem, 'heuristic'
NSB=$(tshark -r $1 -d "udp.port==9000,rtp" "rtp.ssrc==${ff}" -T pdml 2>/dev/null | grep payload | cut -f10 -d'"' | sed -e 's/[d5][54761032]://g; s/[^:]//g' | wc -c)
NONSILENCE=$(echo "scale = 3; print $NSB / 160" | bc);
## suck out the audio payload
tshark -r $1 -d "udp.port==9000,rtp" "rtp.ssrc==${ff}" -T pdml 2>/dev/null | grep payload | cut -f10 -d'"' | grabAudio.pl > ${ff}.raw
## convert it to WAV
sox -c 1 -r 8000 -L -A -t raw ${ff}.raw ${ff}.wav
## Get a timestamp for the first packet
TD=$(tshark -r $1 -d "udp.port==9000,rtp" "rtp.ssrc==$ff" -tad | head -n1 | cut -f2,3 -d" ")
echo -n "$TD ${ff} $NONSILENCE "
## and do some call analysis
tshark -r $1 -d "udp.port==9000,rtp" "rtp.ssrc==$ff" -td 2>/dev/null | qual.pl;
done | sort -n

######################## grabAudio.pl ########################

#!/usr/bin/perl
## translate the ascii-hex payload from tshark to actual binary data
while(<>) {
   $line = $_;
   chop $line;
   foreach $char ( split(/:/,$line)) {
       print chr(hex($char));
   }
}

######################## qual.pl #######################
#!/usr/bin/perl

while (<>) {
   $line = $_;
## strip out unwanted spaces
   $line =~ s/  / /g;
   $line =~ s/^ //;
## separate the fields
   ( $pkt, $delta, $sip, $dummy, $dip, $dummy, $dummy, $dummy, $dummy, $dummy, $dummy, $ssrc, $dummy, $seq, $dummy, $time ) = split(/[ ,=]+/, $line);
## save the inter-packet arrival time in an array
   push(@deltas,$delta);
## and keep the final RTP sequence number
   $lastseq = $seq;
   }
## compare number of packets seen to sequence number for loss..
$pkts = @deltas - 1;
$loss = $lastseq - $pkts;

## get the mean inter-packet gap
foreach $delta ( @deltas ) {
   $dsum += $delta;
}
$dmean = $dsum / $pkts;

## and calculate the standard deviation for the whole set
## ( not sure if this is the right calculation )
foreach $delta ( @deltas ) {
   $dsquared += ( $delta - $dmean ) * ( $delta - $dmean );
}
$jitter = sqrt($dsquared / $pkts);

print "$sip $dip $pkts $loss $dmean $jitter\n";

Now I'm not claiming that this is great. It'll do me for free, and it will make my VoIP guy a bit happier. Especially when I roll it into a nice mini-itx box that he can pop under a problem handset for a week, with a nice web interface, call playback, etc.

The point is, it ain't that hard.

Troubleshooting lessons.

OK. For a couple of weeks, we've had major phone problems. We run a couple of Mitel VoIP boxes and a few hundred phones off them. Each Mitel box has 30 ISDN lines ( I don't really get ISDN, otherwise I'd describe it better ).

We've been getting one way calls to our enquiries people, so our operators can't hear the caller.

This is the story of the diagnosis.

First off, we questioned users and first line support folks. It appeared to be happening on calls coming in to one of the Mitel boxes, destined for a group of handsets on a distant campus.

I took some packet captures from the switch port one of the phones is plugged into. I saw normal background stuff, some mitel control traffic (port 6800 and 6900), and two RTP streams, one each direction. So I pulled the streams to audio files (wireshark is lovely..), and listened to them. Sure enough, the outgoing side was fine. "Hello, ???, can I help you?......Hello?.......Hello?....click". The incoming side was silent. Not quiet, silent.

So, it's not the LAN, I reasoned. The audio stream is getting to the phone, there's just nothing in it.

So we talked to our Mitel reseller, who remotely looked over stuff and said nothing was wrong. And we talked to our ISDN provider, ditto. Our Mitel reseller sent someone out.

And while he was looking, I got a notification that one of our internet routers was down. And I got a colleague to look it over, and restart it. And it's fine, but unreachable. So I go look at the layer 3 switch it's connected to. And the interface is up, but no arp entry and no pings. Oh bum.

So I try pinging the Mitel box from the core switch. Uh-uh. ARP entry? Nope. OK, add a static ARP entry. All of a sudden, all is well.

Turns out it was the network all along. Switch hardware, we're working it out with the vendor right now.

The point being, I was fully satisfied it wasn't the network. I could talk through my analysis with capable colleagues, who agreed. And we were wrong.

The moral of the story? When there's several components to a problem, and all of them check out fine, then someone doesn't understand the problem.

And it's probably you.

Posted on: Sun, Feb 3 2008 10:06 AM