URL?October 29, 2007 at 4:28 pm | Posted in Internet, Technology | Leave a comment
Whats with all the stuff in a web address anyways?
First of all, an Internet address is known as a URL or ‘Uniform Resource Locater’. In other words, a fancy name for an address that any server can understand.
For example, this URL:
The first part: http:// stands for HyperText Transport Protocol. This means a web page. You might see https:// – the added ‘s’ means secure, like on an ecommerce site.
If you see ftp:// – it means File transport Protocol. This is not for viewing pages but for transferring files. So you’ll see files and folders instead, like in a file browser. You may have run into that by accident. The Internet uses a bunch of languages or prototcols to communicate different things. Email has another protocol (or 2), IRC, Messaging, and so on each use their own protocols. Webmail is a little different. It is accessing email, through a web application, a special type of web page with more features that just Hypertext (linked text).
The next part, www, refers to the World Wide Web. It was originally used as the name of the web server, by default www. Most modern servers can do without this and large web sites have many servers so the name of one is meaningless. Now you just need to type microsoft.com and your browser will know you mean http://www.microsoft.com
Another variation you find is called “sub-domains”, where they use the server name to instead point to a subsection of the web site, as in store.apple.com. Blogs at WordPress use the same technique. Note how my blog name precedes the WordPress domain name.
Finally, we come to the name of the web site itself. In our original example, we have windowssecrets.com The ‘.’ tells us its 2 parts. The first part is the name or more accurately, the domain name of the web site. The second part, .com, tells us its (supposedly) a business (commercial). You’ll also see .org (non-profit organization), .net (network), .edu (education), and so on. Domain names with 2 letters denote a “country code”, such as .ca for Canada. Some of these are restricted to their meaning, like .it for Italy. Others, like .tv for Tuvalo are sold for other obvious purposes. Recently some new types like .museum have been introduced that don’t follow any of the old format. Efforts to create a .xxx type to separate out that sort of content and avoid accidental exposure have failed.
This combination of domain name and type, as in wordpress.com, is called a hostname but is not the actual address of the web server. Internet servers are addressed using IP (Internet Protocol) addresses, like 123.321.123.321. As a matter of fact, every device on the Internet including web servers, email servers, network devices, and the computer you’re connecting to this page with all have unique IP addresses assigned. Thats how your browser finds the web server ‘out there’ and the server knows where to send the page you requested.
For example, if you enter 188.8.131.52 into your web browser, it will take you to Apple.com. (IP lookup) What you’re doing is bypassing the domain lookup or DNS (Domain Name Servers) and going right to the web server. But many smaller web sites are hosted all on the same server. So they share an IP address. Then you’d have something like fredsplace.ca = 123.321.109.83/~freds/ The IP address alone would not help.
In other words, fredsplace.ca web site is hosted (found) in a sub-folder of the web server at 123. etc. Combine that with large sites with a number of IP’s for one domain And you can see why a domain name is easier. Much easier to type and remember fredsplace.ca than 123.321.109.83/~freds/, eh wot? If only phone numbers were so easy.
And that brings us to the next part of the original URL
/reviews/cameras you may recognize from browsing your computer. This is known as a path and represents 2 subfolders on the web server. So inside the folder of the web site at windowssecrets.com, which is really 184.108.40.206, we find a reviews folder. And in that a cameras folder. 220.127.116.11/reviews/cameras
The ending of the URL is the file or resource we’re asking for. Like 12345.jpg (an image) or videocams.htm ( a web page). In the case above, #camcorder_review, the # tells us its an Anchor. An anchor is a marker point part way down a web page thats designed to take us to a specific place on a longer page. So this is the camcorder section of the cameras resource. (if it was cameras.htm#camcorder_review, we could say its the camcorder section of the cameras page, but in the example, the page is not specified. The server already knows.)
You may also see a ? part way into the URL. This means its doing a database query and looking something up. For example, http://www.youtube.com/watch?v=o4WISABs1bk. This video on YouTube is addressed by a hexidecimal name (o4WISABs1bk) which is called from a query built into the URL. The video file may be on one or several of YouTubes many servers and its found by a database lookup. The specific location of the file is not shown.
There are many, many other sorts of endings you’ll see. Scripts that call locations or blend resources from several places into one page. Web applications combine things from multiple sources and create a new web page on the fly when you request it. .jsp and .asp pages are like that. Any webmail tool uses this sort of thing. The possibilities are endless and based on the server and code technology being used + the creativity and design of the particular site. The only limits are what your web browser is capable of displaying based on what the web server serves you.
This was the largest benefit of the end of the browser wars and the adoption of Web Standards. Most sites are now coded so that it doesn’t matter what OS or browser you are using. It will always display the same (aside from a few minor browser bugs)
And that, in a nutshell, is the basics of the Internet.