David Beroff (d4b) wrote,
David Beroff
d4b

Standardization and URL's

I've been cleaning up the AboutTh.is database schema over the last few hours, focusing mainly on webpage URL's. I had previously split out the protocol and host from the rest of the URL, figuring earlier that there are multiple protocols, but after rolling it around in my head for a while, I had realized that the popups are really only going to work on actual web pages anyway, and so I could collapse the Protocol table into a single enum, (for http and https), now more-correctly named Scheme. I'm keeping the host in a separate table, now more-correctly named Server, (which includes the rarely-used port number and authentication), so as to allow for better reporting/organization, and at a bare minimum, to be able to respond quickly to any future cease-and-desist orders. I'm also flipping around the hostname nodes internally, storing them as "com.yahoo.news", for the same reasons. It's far easier/faster to use an index to "find everything starting with 'com.yahoo'", rather than, "find everything ending in 'yahoo.com'". (Apparently, even Tim Berners-Lee later regretted the right-to-left DNS convention, noting that a URL like http://www.example.com/path/to/name could have just as easily been written http:com/example/www/path/to/name.)

As for the "most correct" component names, there are standards, and there are standards....

Source: Tantek Çelik, How many ways can you slice a URL and name the pieces?
Tags: legal, mtat, webdev
Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 0 comments