URL Encoding

Before data supplied on a form can be sent to a CGI program, each form element’s name (specified by the name attribute) is equated with the value entered by the user to create a key-value pair. For example, if the user entered “30” when asked for his or her age, the key-value pair would be “age=30”. In the transferred data, key-value pairs are separated by the ampersand (&) character.

Since under the GET method the form information is sent as part of the URL, form information can’t include any spaces or other special characters that are not allowed in URLs, and also can’t include characters that have other meanings in URLs, like slashes (/). (For the sake of consistency, this constraint also exists when the POST method is being used.) Therefore, the web browser performs some special encoding on user-supplied information.

Encoding involves replacing spaces and other special characters in the query strings with their hexadecimal equivalents. (Thus, URL encoding is also sometimes called hexadecimal encoding.) Suppose a user fills out and submits a form containing his or her birthday in the syntax mm/dd/yy (e.g., 11/05/73). The forward slashes in the birthday are among the special characters that can’t appear in the client’s request for the CGI program. Thus, when the browser issues the request, it encodes the data. The following sample request shows the resulting encoding:

POST /cgi-bin/birthday.pl HTTP/1.0 . . [information] . Content-length: 21 birthday=11%2F05%2F73 ...

Get Webmaster in a Nutshell, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.