URIs

In a perfect system, Uniform Resource Identifiers, or URIs (formerly Uniform Resource Locators, or URLs) would be hidden from the site visitor. They aren’t especially human-readable, comprised as they are of a protocol token, a host alias, and something that looks like a filesystem reference but isn’t. URIs often end in token/value pairs that are deliberately designed to be computer-readable, as opposed to visitor-friendly.

We’re all familiar with simple URIs like http://www.example.com/ that point to the home page of a site. These appear in advertisements and on business cards, and the http:// has come to mean “type this in to find the website.” However, well-crafted URIs can contain a lot of information—look at commonly encountered URIs at your favorite search site or news site, and you’ll see a lot more going on. Google search result URIs, for example, can contain a parameter named start that specifies the number of results ranked higher than those displayed, as in http://www.google.com/?q=hypertext&hl=en&start=10. In a similar vein, popular Content Management System (CMS) platforms and e-commerce catalog platforms allow the same resource to be associated with multiple URIs, where the longer URIs enhance a resource’s searchability or specify that additional content be served along with the core resource (e.g., a product listing or the summary of a weblog entry).

Note

Browsers and other tools use the HTTP protocol to process URIs and retrieve information. If you want to know more about how this processing works, and how its features and limitations might affect your pages, see the appendix.

Managing Links

Hypertext as we understand it today was first implemented at Stanford University in the 1960s, but didn’t become an everyday tool until the advent of affordable commercial Internet access three decades later. The “explosion” of the Internet not only provided a way for hyperlinks to connect across a broad network, but it also nurtured an understanding that web hyperlinks should be simple and tolerant of failure.

HTML link conventions assume that the person creating a link knows what will be at the URI at the end of it. That doesn’t necessarily mean that link creators control what is at the end of the link, however. In fact, the ability to link to any content without having to ask its creator beforehand is a critical aspect of the Web’s success. If it has a URI, you can link to it. If a URI doesn’t work, a well-built site will report an error (like the ubiquitous “404 Not Found”) and present a page that can help the lost visitors find their way again.

The power and immediacy of web links raised all kinds of cultural (and in some cases legal) questions about what it means to be able to link directly to someone else’s material, but over time a simpler and probably more intractable issue arose: link rot. Creating links to information you don’t control eventually means that over time those links break, as information and even sites change or disappear. It also means that you may have visitors arriving at your site who are confused and frustrated because they didn’t find what they wanted immediately.

To some degree, link rot is inevitable, and even automated systems (like search engines) have a difficult time keeping up with it. Even if links still point to useful pages, they may evolve over time into something very different. Within your own sites, you have somewhat more control, though major site redesigns can make this difficult. Caution, well-built error pages, and clear navigation can help minimize these problems.

Note

While visitors can usually deal with regular links that send them to the wrong place, it may be more difficult for your pages to recover from missing images, code, stylesheets, or other components that are supposed to be inserted via accurate href and src values. The more important the component to your page, the more you will want to link to it at a stable location under your control.

Improving the User Experience with Linking

Links are part of HTML, the means by which URIs are most commonly exposed within the Web’s application layer, at the point where HTML and HTTP intersect. At the application level, there isn’t much difference between following a link and accessing a given URI through the Location bar of a browser.

Links provide infinite opportunities to site builders—opportunities that are usually passed over. Anything can link to anything else. Hyperlinks in documents aren’t constrained to site navigation, stylesheet references, and syndication references; they can also point to an unlimited number of related documents and all kinds of alternative content. Hyperlinks that respond to user interaction can be placed anywhere, point to anything, and trigger behavior limited only by platform constraints, good sense, and a site builder’s imagination. Well-implemented hypertext enhances information with the following benefits, among many:

Broadened accessibility to and control of information

Hyperlinks can always reference every part of the Web that is not access-controlled. Rather than delivering long chunks of exposition out of necessity (as this book does) or referring to other matters that must then be physically obtained, hyperlinks allow the users to decide for themselves which information resources they will access and how.

Creation of multiple narratives from a single body of content

Hyperlinks make it possible for a visitor’s “journey” to take any and all forms that he desires…within reason.

Community-driven attention flow

Incoming hyperlinks lend credibility to destination content without the need for subject matter–expert intervention—a fact that defines a number of systems already in use, especially Google’s PageRank algorithm. It remains possible for the “wisdom of crowds” to be qualitatively poor, but accuracy tends to increase over time since subject matter experts remain closely involved with the process.

Hypertext Implementation Challenges

Web technology allows users to direct their own experience in ways that until 1992 had been the stuff of science fiction. No single person or entity has unqualified control over a given user’s web experience (although not for lack of trying). A single user session can result in requests for content from multiple unaffiliated authors, on tangential or unrelated subjects, and require an arbitrary amount of user interaction.

This seeming anarchy places new demands on implementers:

  1. Context (i.e., steady “You Are Here” and “That’s Over There” signaling) is the most important part of an effective site, apart from the actual site content.

  2. Untested assumptions about a visitor’s goals and knowledge create a short, straight path to folly and disaster.

  1. Duplication of content adds needless burdens to the user experience (and to the site building process).

  2. The Web’s lack of bounds, assumptions, and context can create user impairments out of thin air, and often these impairments must be addressed. The Web’s tremendous openness creates the need for specialist disciplines in web information architecture and usability.

Because the Web breaks the linear structure of traditional media outright, implementers must never forget that their tools define context, first and foremost.

Get HTML & CSS: The Good Parts now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.