Chapter 1. Introduction

This book is a compilation of some fairly diverse reference material. What links these topics is that they are crucial knowledge for today’s webmaster in a Unix environment.

In this chapter, we give the world’s quickest introduction to web technology and the role of the webmaster who breathes life into each web document. If you want to learn more about the history of the Web, how to make your web pages “cool,” the social impact of the Internet, or how to make money online, this is the wrong book.

This is a book by impatient writers for impatient readers. We’re less interested in the hype of the Web than we are in what makes it actually tick. We’ll leave it to the pundits to predict the future of the Web or to declare today’s technology already outdated. Too much analysis makes our heads spin; we just want to get our web sites online.

The Web in a Nutshell

We’ve organized this book in a roughly “outside-in” fashion—that is, with the outermost layer (HTML) first and the innermost layer (the server itself) last. But since it’s a good idea for all readers to know how everything fits together, let’s take a minute to breeze through a description of the Web from the inside-out: no history, no analysis, just the technology basics.

Clients and Servers

The tool most people use on the Web is a browser, such as Netscape Navigator, Internet Explorer, Opera, Mosaic, or Lynx. Web browsers work by connecting over the Internet to remote machines, requesting specific documents, and then formatting the documents they receive for viewing on the local machine.

The language, or protocol, used for web transactions is Hypertext Transfer Protocol, or HTTP. The remote machines containing the documents run HTTP servers that wait for requests from browsers and then return the specified document. The browsers themselves are technically HTTP clients.

Uniform Resource Locators (URLs)

One of the most important things to grasp when working on the Web is the format for URLs. A URL is basically an address on the Web, identifying each document uniquely (for example, http://www.oreilly.com/products.html). Since URLs are so fundamental to the Web, we discuss them here in a little detail. The simple syntax for a URL is:

http://host/path

where:

host

The host to connect to—e.g., www.oreilly.com or www.altavista.com. (While many web servers run on hosts beginning with www, the www prefix is just a convention.)

path

The document requested on that server. This is not the same as the filesystem path, as its root is defined by the server.

Most URLs you encounter follow this simple syntax. A more generalized syntax, however, is:

               scheme://host/path/extra-path-info?query-info

where:

scheme

The protocol that connects to the site. For web sites, the scheme is http; for FTP, the scheme is ftp.

extra-path-info and query-info

Optional information used by CGI programs. See Chapter 12 for more information.

HTML documents also often use a “shorthand” for linking to other documents on the same server, called a relative URL. An example of a relative URL is images/webnut.gif. The browser knows to translate this into complete URL syntax before sending the request. For example, if http://www.oreilly.com/books/webnut.html contains a reference to images/webnut.gif, the browser reconstructs the relative URL as a full (or absolute) URL, http://www.oreilly.com/books/images/webnut.gif, and requests that document independently (if needed).

Often in this book, you’ll see us refer to a URI, not a URL. A URI (Universal Resource Identifier) is a superset of URL, in anticipation of different resource naming conventions being developed for the Web. For the time being, however, the only URI syntax in practice is URL; so while purists might complain, you can safely assume that “URI” is synonymous with “URL” and not go wrong (yet).

Web Content: HTML, XML, CGI, JavaScript, and PHP

While web documents can conceivably be in any format, the universal standard is Hypertext Markup Language (HTML), a language for creating formatted text interspersed with images, sounds, animation, and hypertext links to other documents anywhere on the Web. Chapter 2 through Chapter 8 cover the most current version of HTML.

In 1996, a significant extension to HTML was developed in the form of Cascading Style Sheets (CSS). Cascading Style Sheets allow web site developers to associate a number of style-related characteristics (such as font, color, spacing, etc.) with a particular HTML tag. This enables HTML authors to create a consistent look and feel throughout a set of documents. Chapter 9 provides an overview of and a reference to CSS.

While HTML remains the widespread choice for web site development, there is also an heir apparent called XML (Extensible Markup Language). XML is a meta-language that allows you to define your own document tags. While XML’s development remains highly volatile, Chapter 10 gives you the basics.

When static documents aren’t sufficient for a web site’s needs, you can use tools such as CGI, JavaScript, and PHP. CGI is a way for the web server to call external programs instead of simply returning a static document. Chapter 12 through Chapter 15 are intended for CGI programmers using the Perl programming language. JavaScript and PHP are both programming languages embedded directly into HTML documents, but that’s where the similarities end: JavaScript is used primarily for client-side scripting, and PHP is used primarily for database access. See Chapter 11 and Chapter 16.

The HTTP Protocol

In between clients and servers is the network, which uses TCP (Transmission Control Protocol) and IP (Internet Protocol) to transmit data and find servers and clients. On top of TCP/IP, clients and servers use the HTTP protocol to communicate. Chapter 17 gives details on the HTTP protocol, which you must understand for writing CGI programs, server scripts, web administration, and just about any other part of working with a server.

Web Server

The runaway leader among Unix-based web servers is Apache. Chapter 18 deals with configuring Apache, while Chapter 19 discusses the various Apache modules. Regardless of the type of server you’re running, there are various measures you can take to maximize its efficiency. Chapter 20 describes a number of these server optimization techniques.

Get Webmaster in a Nutshell, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.