The URLEncoder and URLDecoder Classes

One of the problems that the designers of the Web faced was differences between local operating systems. These differences can cause problems with URLs: for example, some operating systems allow spaces in filenames; some don’t. Most operating systems won’t complain about a # sign in a filename; in a URL, a # sign means that the filename has ended, and a named anchor follows. Similar problems are presented by other special characters, nonalphanumeric characters, etc., all of which may have a special meaning inside a URL or on another operating system. To solve these problems, characters used in URLs must come from a fixed subset of ASCII, in particular:

  • The capital letters A-Z

  • The lowercase letters a-z

  • The digits 0-9

  • The punctuation characters - _ . ! ~ * ` (and , )

The characters : / & ? @ # ; $ + = % and , may also be used, but only for their specified purposes. If these characters occur as part of a filename, then they and all other characters should be encoded.

The encoding used is very simple. Any characters that are not ASCII numerals, letters, or the punctuation marks specified earlier are represented by a percent sign followed by two hexadecimal digits giving the value for that character. Spaces are a special case because they’re so common. Besides being encoded as %20, they can be encoded as a plus sign (+). The plus sign itself is encoded as %2B. The / # = & and ? characters should be encoded when they are used as part of a name, and not ...

Get Java Network Programming, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.