O'Reilly logo

Essential PHP Security by Chris Shiflett

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Practices

Like the principles described in the previous section, there are many practices that you can employ to develop more secure applications. This list of practices is also small and focused to highlight the ones that I consider to be most important.

Some of these practices are abstract, but each has practical applications, which are described to clarify the intended use and purpose of each.

Balance Risk and Usability

While user friendliness and security safeguards are not mutually exclusive, steps taken to increase security often decrease usability. While it's important to consider illegitimate uses of your applications as you write your code, it's also important to be mindful of your legitimate users. The appropriate balance can be difficult to achieve, and it's something that you have to determine for yourself—no one else can determine the best balance for your applications.

Try to employ the use of safeguards that are transparent to the user. If this isn't possible, try to use safeguards that are already familiar to the user (or likely to be). For example, providing a username and password to gain access to restricted information or services is an expected procedure.

When you suspect foul play, realize that you might be mistaken and act accordingly. For example, it is a common practice to prompt users to enter their password again whenever their identity is in question. This is a minor hassle to legitimate users but a substantial obstacle to an attacker. Technically, this is almost identical to prompting users to authenticate themselves again entirely, but the user experience is much friendlier.

There is very little to gain by logging users out entirely or chiding them about an alleged attack. These approaches degrade usability substantially when you make a mistake, and mistakes happen.

In this book, I focus on providing safeguards that are either transparent or expected, and I encourage careful and sensible reactions to suspected attacks.

Track Data

The most important thing you can do as a security-conscious developer is keep track of data at all times—not only what it is and where it is, but also where it's from and where it's going. Sometimes this can be difficult, especially without a firm understanding of how the Web works, and this is why inexperienced web developers are prone to making mistakes that yield security vulnerabilities, even when they have experience developing applications in other environments.

Most people who use email are not easily fooled by spam with a subject of "Re: Hello"—they recognize that the subject can be forged, and therefore the email isn't necessarily a reply to a previous email with a subject of "Hello." In short, people know not to place much trust in the subject. Far fewer people realize that the From header can also be forged. They mistakenly believe that this reliably indicates the email's origin.

The Web is very similar, and one of the things I want to teach you is how to distinguish between the data that you can trust and the data that you cannot. It's not always easy, but blind paranoia certainly isn't the answer.

PHP helps you identify the origin of most data—superglobal arrays such as $_GET, $_POST, and $_COOKIE clearly identify input from the user. A strict naming convention can help you keep up with the origin of all data throughout your code, and this is a technique that I frequently demonstrate and highly recommend.

While understanding where data enters your application is paramount, it is also very important to understand where data exits your application. When you use echo, for example, you are sending data to the client. When you use mysql_query(), you are sending data to a MySQL database (even when the purpose of the query is to retrieve data).

When I audit a PHP application for security vulnerabilities, I focus on the code that interacts with remote systems. This code is the most likely to contain security vulnerabilities, and it therefore demands the most careful attention to detail during development and during peer reviews.

Filter Input

Filtering is one of the cornerstones of web application security. It is the process by which you prove the validity of data. By ensuring that all data is properly filtered on input, you can eliminate the risk that tainted (unfiltered) data is mistakenly trusted or misused in your application. The vast majority of security vulnerabilities in popular PHP applications can be traced to a failure to filter input.

When I refer to filtering input, I am really describing three different steps:

  • Identifying input

  • Filtering input

  • Distinguishing between filtered and tainted data

The first step is to identify input because if you don't know what it is, you can't be sure to filter it. Input is any data that originates from a remote source. For example, anything sent by the client is input, although the client isn't the only remote source of data—other examples include database servers and RSS feeds.

Data that originates from the client is easy to identify—PHP provides this data in superglobal arrays, such as $_GET and $_POST. Other input can be more difficult to identify—for example, $_SERVER contains many elements that can be manipulated by the client. It's not always easy to determine which elements in $_SERVER constitute input, so a best practice is to consider this entire array to be input.

What you consider to be input is a matter of opinion in some cases. For example, session data is stored on the server, and you might not consider the session data store to be a remote source. If you take this stance, you can consider the session data store to be an integral part of your application. It is wise to be mindful of the fact that this ties the security of your application to the security of the session data store. This same perspective can be applied to a database because the database can be considered a part of the application as well.

Generally speaking, it is more secure to consider data from session data stores and databases to be input, and this is the approach that I recommend for any critical PHP application.

Once you have identified input, you're ready to filter it. Filtering is a somewhat formal term that has many synonyms in common parlance—sanitizing, validating, cleaning, and scrubbing. Although some people differentiate slightly between these terms, they all refer to the same process—preventing invalid data from entering your application.

Various approaches are used to filter data, and some are more secure than others. The best approach is to treat filtering as an inspection process. Don't correct invalid data in order to be accommodating—force your users to play by your rules. History has shown that attempts to correct invalid data often create vulnerabilities. For example, consider the following method intended to prevent file traversal (ascending the directory tree):

    <?php

    $filename = str_replace('..', '.', $_POST['filename']);

    ?>

Can you think of a value of $_POST['filename'] that causes $filename to be ../../etc/passwd? Consider the following:

    .../.../etc/passwd

This particular error can be corrected by continuing to replace the string until it is no longer found:

    <?php

    $filename = $_POST['filename'];

    while (strpos($_POST['filename'], '..') !=  = FALSE)
    {
      $filename = str_replace('..', '.', $filename);
    }

    ?>

Of course, the basename() function can replace this entire technique and is a safer way to achieve the desired goal. The important point is that any attempt to correct invalid data can potentially contain an error and allow invalid data to pass through. Inspection is a much safer alternative.

In addition to treating filtering as an inspection process, you want to use a whitelist approach whenever possible. This means that you want to assume the data that you're inspecting to be invalid unless you can prove that it is valid. In other words, you want to err on the side of caution. Using this approach, a mistake results in your considering valid data to be invalid. Although undesirable (as any mistake is), this is a much safer alternative than considering invalid data to be valid. By mitigating the damage caused by a mistake, you increase the security of your applications. Although this idea is theoretical in nature, history has proven it to be a very worthwhile approach.

If you can accurately and reliably identify and filter input, your job is almost done. The last step is to employ a naming convention or some other practice that can help you to accurately and reliably distinguish between filtered and tainted data. I recommend a simple naming convention because this can be used in both procedural and object-oriented paradigms. The convention that I use is to store all filtered data in an array called $clean. This allows you to take two important steps that help to prevent the injection of tainted data :

  • Always initialize $clean to be an empty array.

  • Add logic to detect and prevent any variables from a remote source named clean.

In truth, only the initialization is crucial, but it's good to adopt the habit of considering any variable named clean to be one thing—your array of filtered data. This step provides reasonable assurance that $clean contains only data that you knowingly store therein and leaves you with the responsibility of ensuring that you never store tainted data in $clean.

In order to solidify these concepts, consider a simple HTML form that allows a user to select among three colors:

    <form action="process.php" method="POST">
    Please select a color:
    <select name="color">
      <option value="red">red</option>
      <option value="green">green</option>
      <option value="blue">blue</option>
    </select>
    <input type="submit" />
    </form>

In the programming logic that processes this form, it is easy to make the mistake of assuming that only one of the three choices can be provided. As you will learn in Chapter 2, the client can submit any data as the value of $_POST['color']. To properly filter this data, you can use a switch statement:

    <?php

    $clean = array();

    switch($_POST['color'])
    {
      case 'red':
      case 'green':
      case 'blue':
        $clean['color'] = $_POST['color'];
        break;
    }

    ?>

This example first initializes $clean to an empty array in order to be certain that it cannot contain tainted data. Once it is proven that the value of $_POST['color'] is one of red, green, or blue, it is stored in $clean['color']. Therefore, you can use $clean['color'] elsewhere in your code with reasonable assurance that it is valid. Of course, you could add a default case to this switch statement to take a particular action in the case of invalid data. One possibility is to display the form again while noting the error—just be careful not to output the tainted data in an attempt to be friendly.

While this particular approach is useful for filtering data against a known set of valid values, it does not help you filter data against a known set of valid characters. For example, you might want to assert that a username may contain only alphanumeric characters:

    <?php

    $clean = array();

    if (ctype_alnum($_POST['username']))
    {
      $clean['username'] = $_POST['username'];
    }

    ?>

Although a regular expression can be used for this particular purpose, using a native PHP function is always preferable. These functions are less likely to contain errors than code that you write yourself is, and an error in your filtering logic is almost certain to result in a security vulnerability.

Escape Output

Another cornerstone of web application security is the practice of escaping output—escaping or encoding special characters so that their original meaning is preserved. For example, O'Reilly is represented as O\'Reilly when being sent to a MySQL database. The backslash before the apostrophe is there to preserve it—the apostrophe is part of the data and not meant to be interpreted by the database.

As with filtering input, when I refer to escaping output , I am really describing three different steps:

  • Identifying output

  • Escaping output

  • Distinguishing between escaped and unescaped data

Tip

It is important to escape only filtered data. Although escaping alone can prevent many common security vulnerabilities, it should never be regarded as a substitute for filtering input. Tainted data must be first filtered and then escaped.

To escape output, you must first identify output. In general, this is much easier than identifying input because it relies on an action that you take. For example, to identify output being sent to the client, you can search for strings such as the following in your code:

  • echo

  • print

  • printf

  • <?=

As the developer of an application, you should be aware of every case in which you send data to a remote system. These cases all constitute output.

Like filtering, escaping is a process that is unique for each situation. Whereas filtering is unique according to the type of data you're filtering, escaping is unique according to the type of system to which you're sending data.

For most common destinations (including the client, databases, and URLs), there is a native escaping function that you can use. If you must write your own, it is important to be exhaustive. Find a reliable and complete list of every special character in the remote system and the proper way to represent each character so that it is preserved rather than interpreted.

The most common destination is the client, and htmlentities() is the best escaping function for escaping data to be sent to the client. Like most string functions, it takes a string and returns the modified version of the string. However, the best way to use htmlentities() is to specify the two optional arguments—the quote style (the second argument) and the character set (the third argument). The quote style should always be ENT_QUOTES in order for the escaping to be most exhaustive, and the character set should match the character set indicated in the Content-Type header that your application includes in each response.

To distinguish between escaped and unescaped data, I advocate the use of a naming convention. For data to be sent to the client, the convention I use is to store all data escaped with htmlentities() in $html, an array that is initialized to an empty array and contains only data that has been both filtered and escaped:

    <?php

    $html = array();

    $html['username'] = htmlentities($clean['username'],
      ENT_QUOTES, 'UTF-8');

    echo "<p>Welcome back, {$html['username']}.</p>";

    ?>

Tip

The htmlspecialchars() function is almost identical to htmlentities(). It accepts the same arguments, and the only difference is that it is less exhaustive.

By using $html['username'] when sending the username to the client, you can be sure that special characters are not interpreted by the browser. If the username contains only alphanumeric characters, the escaping is not actually necessary, but it is a practice that adheres to Defense in Depth. Consistently escaping all output is a good habit that dramatically increases the security of your applications.

Another popular destination is a database. When possible, you should escape data used in an SQL query with an escaping function native to your database. For MySQL users, the best escaping function is mysql_real_escape_string(). If there is no native escaping function for your database, addslashes() can be used as a last resort.

The following example demonstrates the proper escaping technique for a MySQL database:

    <?php

    $mysql = array();

    $mysql['username'] =
      mysql_real_escape_string($clean['username']);

    $sql = "SELECT *
            FROM   profile
            WHERE  username = '{$mysql['username']}'";

    $result = mysql_query($sql);

    ?>

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required