August 8th, 2013


Standardization and URL's

I've been cleaning up the database schema over the last few hours, focusing mainly on webpage URL's. I had previously split out the protocol and host from the rest of the URL, figuring earlier that there are multiple protocols, but after rolling it around in my head for a while, I had realized that the popups are really only going to work on actual web pages anyway, and so I could collapse the Protocol table into a single enum, (for http and https), now more-correctly named Scheme. I'm keeping the host in a separate table, now more-correctly named Server, (which includes the rarely-used port number and authentication), so as to allow for better reporting/organization, and at a bare minimum, to be able to respond quickly to any future cease-and-desist orders. I'm also flipping around the hostname nodes internally, storing them as "", for the same reasons. It's far easier/faster to use an index to "find everything starting with ''", rather than, "find everything ending in ''". (Apparently, even Tim Berners-Lee later regretted the right-to-left DNS convention, noting that a URL like could have just as easily been written http:com/example/www/path/to/name.)

As for the "most correct" component names, there are standards, and there are standards....

Source: Tantek Çelik, How many ways can you slice a URL and name the pieces?