David Beroff (d4b) wrote,
David Beroff

Standardization and URL's

I've been cleaning up the AboutTh.is database schema over the last few hours, focusing mainly on webpage URL's. I had previously split out the protocol and host from the rest of the URL, figuring earlier that there are multiple protocols, but after rolling it around in my head for a while, I had realized that the popups are really only going to work on actual web pages anyway, and so I could collapse the Protocol table into a single enum, (for http and https), now more-correctly named Scheme. I'm keeping the host in a separate table, now more-correctly named Server, (which includes the rarely-used port number and authentication), so as to allow for better reporting/organization, and at a bare minimum, to be able to respond quickly to any future cease-and-desist orders. I'm also flipping around the hostname nodes internally, storing them as "com.yahoo.news", for the same reasons. It's far easier/faster to use an index to "find everything starting with 'com.yahoo'", rather than, "find everything ending in 'yahoo.com'". (Apparently, even Tim Berners-Lee later regretted the right-to-left DNS convention, noting that a URL like http://www.example.com/path/to/name could have just as easily been written http:com/example/www/path/to/name.)

As for the "most correct" component names, there are standards, and there are standards....

Source: Tantek Çelik, How many ways can you slice a URL and name the pieces?
Tags: legal, mtat, webdev
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded