IanG on Tap

Ian Griffiths in Weblog Form (RSS 2.0)

Blog Navigation

April (2018)

(1 item)

August (2014)

(1 item)

July (2014)

(5 items)

April (2014)

(1 item)

March (2014)

(1 item)

January (2014)

(2 items)

November (2013)

(2 items)

July (2013)

(4 items)

April (2013)

(1 item)

February (2013)

(6 items)

September (2011)

(2 items)

November (2010)

(4 items)

September (2010)

(1 item)

August (2010)

(4 items)

July (2010)

(2 items)

September (2009)

(1 item)

June (2009)

(1 item)

April (2009)

(1 item)

November (2008)

(1 item)

October (2008)

(1 item)

September (2008)

(1 item)

July (2008)

(1 item)

June (2008)

(1 item)

May (2008)

(2 items)

April (2008)

(2 items)

March (2008)

(5 items)

January (2008)

(3 items)

December (2007)

(1 item)

November (2007)

(1 item)

October (2007)

(1 item)

September (2007)

(3 items)

August (2007)

(1 item)

July (2007)

(1 item)

June (2007)

(2 items)

May (2007)

(8 items)

April (2007)

(2 items)

March (2007)

(7 items)

February (2007)

(2 items)

January (2007)

(2 items)

November (2006)

(1 item)

October (2006)

(2 items)

September (2006)

(1 item)

June (2006)

(2 items)

May (2006)

(4 items)

April (2006)

(1 item)

March (2006)

(5 items)

January (2006)

(1 item)

December (2005)

(3 items)

November (2005)

(2 items)

October (2005)

(2 items)

September (2005)

(8 items)

August (2005)

(7 items)

June (2005)

(3 items)

May (2005)

(7 items)

April (2005)

(6 items)

March (2005)

(1 item)

February (2005)

(2 items)

January (2005)

(5 items)

December (2004)

(5 items)

November (2004)

(7 items)

October (2004)

(3 items)

September (2004)

(7 items)

August (2004)

(16 items)

July (2004)

(10 items)

June (2004)

(27 items)

May (2004)

(15 items)

April (2004)

(15 items)

March (2004)

(13 items)

February (2004)

(16 items)

January (2004)

(15 items)

Blog Home

RSS 2.0

Writing

Programming C# 5.0

Programming WPF

Other Sites

Interact Software

Don't lose your HTTP HEAD

Friday 11 June, 2004, 10:31 AM

I just discovered a flaw in the web application for this site. My blog didn't support the HTTP HEAD verb.

Why would that matter? Well, Scott 'Early Adopter' Swigart just posted a link to the W3C link checker, and I thought I'd use it to find out if I have any dead links on my site. (Answer: no, but I'm pointing to a few URLs which issue a redirect when you go there, which I should probably tidy up.)

But the first time I ran the link checker, I was rather alarmed to see a vast number of 404 errors. Especially since these were exclusive for URLs linking internally to other pages on my site! I clicked on all the 'bad' links, and they were all working just fine. So I was perplexed. I was also pleased, because my web stats indicate that I do get the odd inexplicable 404 error for valid URLs on my site for reasons I had never managed to pin down (or reproduce under load testing). I now had a way of generating these errors that was easily reproducible.

I spent some time trying to work out what it was about the access patterns the W3C link checker was using that caused the errors in my app. It does seem to launch a lot of requests concurrently, or at least in very quick succession, so I was wondering if I had a threading bug. I also thought I might be seeing something similar to this problem, but apparently not. The fix he uses makes no difference for me, and in any case, the symptoms of the problems weren't really the same.

However, after adding various extra bits of diagnostic instrumentation, I realised after about half an hour that it was much simpler than I had thought:

All the 404 not found errors were for HTTP requests using the HEAD verb.

HEAD is one of those HTTP methods I never really pay much attention to. It's pretty straightforward - the HTTP spec defines it as being identical to GET except it must not return a body. It allows a client to find out what headers it would have seen if had issued a normal GET request, without incurring the expense of actually generating the page.

So you can see why something like a link checker would use HEAD rather than GET - all it wants to do is make sure that the page would be available if it actually tried to download it - it's not interested in looking at the contents.

Most of the time in ASP.NET, you don't even need to think about this. The .aspx handler deals with HEAD requests correctly on your behalf. It's only if you write your own HTTP handler that you might have a problem. And that's exactly what I did...

My web.config file contains an entry in the <httpHandlers> section for my blog handler. (It's mapped to /BlogHandler, although you won't be able to access that URL from outside, because I use an internal URL rewriting scheme. But this is the handler that anything beginning with /iangblog/... ends up running through. The web.config file was configured only to pass GET requests to my handler. So I simply had to add a HEAD verb to the list.

Of course I wasn't quite done - HEAD behaves differently from GET. But since my blog HTTP handler is really just a front controller that usually transfers the request through to one of my .aspx blog page templates, it required very little alteration. The only thing I had to change was the RSS generation. If you go to /iangblog/rss2.0, it doesn't internally redirect to an .aspx page - instead it runs some code to generate an RSS feed. (Or return a 304 not modified, depending.) I had to modify this code to check the verb, and not bother to generate the RSS body for a HEAD request and then I was done.

And now the W3C link checker is happy. All that remains to be seen is if this cures the mysterious handful of 404 errors that my web stats report for perfectly valid URLs every month...

Copyright © 2002-2024, Interact Software Ltd. Content by Ian Griffiths. Please direct all Web site inquiries to webmaster@interact-sw.co.uk