Wednesday, August 03, 2005

Javascript citations, round 3

A minor but necessary tweak to the citation code discussed earlier: escape the URL you’re searching for when you form Bloglines and Technorati search URLs.

Why? Because Radio-based blogs—like Scobleizer and Scripting News—use permalink URLs including fragment identifiers. For example, the most recent post on Scoblelizer has a permalink URL of:

http://radio.weblogs.com/0001011/2005/07/31.html#a10804

This permalink identifies a particular post on the blog’s 31/07/2005 archive page; a browser resolves it by requesting the URL that precedes the # and searching the returned page for the fragment identifier that follows the #.

So, if I form a Technorati search URL by naive concatenation, as I had been doing, you get:

http://www.technorati.com/search/http://radio.weblogs.com/0001011/2005/07/31.html#a10804 (no links at time of writing)

Close, but this isn’t actually searching for what I want it to search for. What’s actually happening here is a Technorati search for all citations of the 31/07/05 archive page, followed by an unsuccessful in-browser search of the resulting page for the a10804 fragment identifier.

What I need to do is have Technorati search for the entire Scobleizer URL, including the fragment; the way to do this is to escape() the search URL before concatenation, which encodes the # and hides it from the browser’s special treatment:

http://www.technorati.com/search/http%3A//radio.weblogs.com/0001011/2005/07/31.html%23a10804 (6 links at time of writing)

That’s better. So, oops; my mistake. And the lesson: beware quick and dirty hacks.

Comments: