(1 item) |
|
(1 item) |
|
(5 items) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(6 items) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(4 items) |
|
(2 items) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(2 items) |
|
(5 items) |
|
(3 items) |
|
(1 item) |
|
(1 item) |
|
(1 item) |
|
(3 items) |
|
(1 item) |
|
(1 item) |
|
(2 items) |
|
(8 items) |
|
(2 items) |
|
(7 items) |
|
(2 items) |
|
(2 items) |
|
(1 item) |
|
(2 items) |
|
(1 item) |
|
(2 items) |
|
(4 items) |
|
(1 item) |
|
(5 items) |
|
(1 item) |
|
(3 items) |
|
(2 items) |
|
(2 items) |
|
(8 items) |
|
(7 items) |
|
(3 items) |
|
(7 items) |
|
(6 items) |
|
(1 item) |
|
(2 items) |
|
(5 items) |
|
(5 items) |
|
(7 items) |
|
(3 items) |
|
(7 items) |
|
(16 items) |
|
(10 items) |
|
(27 items) |
|
(15 items) |
|
(15 items) |
|
(13 items) |
|
(16 items) |
|
(15 items) |
I just discovered a flaw in the web application for this site. My blog didn't support the HTTP HEAD verb.
Why would that matter? Well, Scott 'Early Adopter' Swigart just posted a link to the W3C link checker, and I thought I'd use it to find out if I have any dead links on my site. (Answer: no, but I'm pointing to a few URLs which issue a redirect when you go there, which I should probably tidy up.)
But the first time I ran the link checker, I was rather alarmed to see a vast number of 404 errors. Especially since these were exclusive for URLs linking internally to other pages on my site! I clicked on all the 'bad' links, and they were all working just fine. So I was perplexed. I was also pleased, because my web stats indicate that I do get the odd inexplicable 404 error for valid URLs on my site for reasons I had never managed to pin down (or reproduce under load testing). I now had a way of generating these errors that was easily reproducible.
I spent some time trying to work out what it was about the access patterns the W3C link checker was using that caused the errors in my app. It does seem to launch a lot of requests concurrently, or at least in very quick succession, so I was wondering if I had a threading bug. I also thought I might be seeing something similar to this problem, but apparently not. The fix he uses makes no difference for me, and in any case, the symptoms of the problems weren't really the same.
However, after adding various extra bits of diagnostic instrumentation, I realised after about half an hour that it was much simpler than I had thought:
All the 404 not found errors were for HTTP requests using the HEAD
verb.
HEAD
is one of those HTTP methods I never really pay much attention to. It's pretty straightforward -
the HTTP spec defines it as being identical to GET
except it must not return a body. It allows a client to
find out what headers it would have seen if had issued a normal GET
request, without incurring the
expense of actually generating the page.
So you can see why something like a link checker would use HEAD
rather than GET
-
all it wants to do is make sure that the page would be available if it actually tried to download it - it's not interested in
looking at the contents.
Most of the time in ASP.NET, you don't even need to think about this. The .aspx
handler deals with
HEAD
requests correctly on your behalf. It's only if you write your own HTTP handler that you might
have a problem. And that's exactly what I did...
My web.config
file contains an entry in the <httpHandlers>
section for my
blog handler. (It's mapped to /BlogHandler
, although you won't be able to access that URL from outside,
because I use an internal URL rewriting
scheme. But this is the handler that anything beginning with /iangblog/...
ends up running through. The
web.config
file was configured only to pass GET
requests to my handler. So I simply
had to add a HEAD
verb to the list.
Of course I wasn't quite done - HEAD
behaves differently from GET
. But since my
blog HTTP handler is really just a front controller that usually transfers the request through to one of my .aspx
blog page templates, it required very little alteration. The only thing I had to change was the RSS generation. If you go to
/iangblog/rss2.0
, it doesn't internally redirect to an .aspx
page - instead it runs some code
to generate an RSS feed. (Or return a 304 not modified, depending.) I had to modify this code to check the verb, and not
bother to generate the RSS body for a HEAD
request and then I was done.
And now the W3C link checker is happy. All that remains to be seen is if this cures the mysterious handful of 404 errors that my web stats report for perfectly valid URLs every month...