Reply to Gregory Narain’s Why cruft doesn’t matter
By Oli
At 3:15 PM · Thursday, 29 January · 2004
To Coding · Weblogging
Gregory Narain recently outlined why he thinks URL cruft doesn’t matter, and he references my URL cruft, and how to remove it
article. While I agree with some of what he says (for example, I don’t find my date-based URL scheme easy to remember ;-), I disagree with his conclusion. My reply ended up being quite long (too long for a comment), so here it is:
Firstly I think Greg has been overly generous in using http://cnn.com/news/2004/0004.html as an example of a crufty URL. I’d be thinking more along the lines of:
http://www.mapquest.com/maps/map.adp?latlongtype=internal
&addtohistory=&latitude=nybZqHttihQ%3d&longitude=
QrjiLeu%2fzTvqQ6pUOG9%2bIQ%3d%3d&name=Microsoft%20
Corp&countryid=250&country=US&address=1%20
Microsoft%20Way&city=Redmond&state=WA&zipcode=
98052&phone=425%2d882%2d8080&cat=microsoft&spurl=0
For a weblog-related example, how about GeekLog (not really long, but not very useful):
http://www.geeklog.net/article.php?story=20030803155454822
CNN’s URLs are also a little longer:
http://www.cnn.com/2004/TECH/space/01/28/space.mars.reut/index.html
re: Short if possible
If you send a URL via email and it’s longer than about 78 characters, it’ll get split over more than one line. For some reason email clients are really bad at working out where the URL should end, and often stop it at the end of the first line. This makes a broken link that’ll lead nowhere. Even using Greg’s ‘20 words’ example, it’s easy for a URL based on a title to go over that:
http://www.movabletype.org/docs/mtmanual_troubleshooting.
html#movable%20type%20encodes%20the%20characters%20in
%20my%20language%20incorrectly
Greg also cites broadband and big hard disk in relation to URL (byte) size, which I think is a non-issue as URLs are so tiny anyway. Still I wonder if he’d be surprised that in the USA 58% of users are still using ‘narrowband’ (source: Nielsen/Netratings 2003.11 pdf)? His claim that a $1,200USD hard disk is a ‘consumer’ product is also just funny (maybe in a few years ;-)
Greg says search engines are another reason for not fearing long URLs. However many search engines are ‘unwilling’ to index URLs generated by scripts (the main long URL culprits), usually visible by “&” and “?”. This is because the script could generate a huge number of URLs, for example MapQuest URLs.
Greg wonders why using a date-stamp in URLs would make them better. The main reason is the time you publish a story is one of the only things that won’t change, so it’s a good thing to make a unique filename from. Tim Berners-Lee says After the creation date, putting any information in the name is asking for trouble one way or another.
His article Cool URIs don’t change explains it well from the depths of 1998. A URL like /archives/2004/01/24/filename is more usable than /archives/000001.html because you can tell the date of publication before you even see the page, and you (hopefully) can access the day, month and year indexes just by deleting parts of the URL.
Another admittedly minor reason for not-ridiculously-long URLs is for easy display in weblog comments. Many weblogs don’t allow ‘live’ links, or link a URL automatically using the URL itself as the link text. This can result in the URL trailing off the screen as it breaks out of the CSS layout. If the site uses tables, the containing column will expand pushing anything to the right of it off the screen and into horizontal scrolling hell. You’ll probably see this on a couple of the URLs above.
re: Hackable
Greg suggests using a a search-sensitive error handler
to duplicate the hackable nature of non-crufty URLs, to avoid detailed preparation and management of hackable pages down the URL string
. Huh? MovableType already generates those index files automatically! ;-) If someone can show me an easy way to create an intelligent 404 page as good as PHP.net’s I’d be very happy. Given my experiences with most 404 pages, I don’t think I’d describe them as painless for the user
either ;-)
re: Permanent
As in the intelligent 404 pages example above, the methods Greg suggests to maintain URL names are either hard (mod_rewrite), or impractical (404 pages are hard to do well, and sending every page to a PHP processor etc is server-intensive). Given good URLs are relatively easy to make, why not start out with them? I also question that it requires a truly deep financial and technical commitment
to keep URLs unchanged. All you need is to renew your domain name and hosting, and not move/rename the static files that MT generates. This raises the bar about as high as having a website to begin with.
re: Summary
Greg concludes with:
No system of organization should compromise or otherwise straddle [sic] innovation when other, forgiving and flexible alternatives exist.
I still can’t see how creating non-crufty URLs prevents ‘innovation’, or anything else except (hopefully) link rot and user confusion. Also, while I agree that alternatives are available, none of them are as easy as simply setting a weblog up with a non-crufty URL scheme from the start.
I think the 5 reasons for non-crufty URLs that I gave are basically all about usability. Instead of saying “non-crufty URLs” maybe we should talk about “usable URLs”. I personally think a URL like http://oli.boblet.net/2004/01/29/cruft is more usable than http://socialtwister.com/archives/000042.html. Before even looking at the page I know
- when it was published
- a little about the page’s topic
- and (generally) a little about the structure of the website
Using the default MT archive scheme I only know ‘it was the 42nd story written’ ;-) What do you think?
Discussion...
- 1. Trackback from SocialTwister · 30 Jan, 2004 · 3:11 AM
Read more inIt seems that my recent rant on Cruft has generated some feedback, and surely it is welcome. Oli Studholme has posted a nice response, to the points that I have laid out relating to Cruft. Note that Oli’s entry can…
Cruft Redux
»- 2. Comment by Gregory Narain · 30 Jan, 2004 · 3:16 AM
Oli,
I’ve posted my reponse to your points over at the Twister.. it was definitely too long to jam into this box, haha.
http://socialtwister.com/archives/000049.html
Best regards,
Greg