Chapter 11. Web Automation

Introduction

Most of the time, PHP is part of a web server, sending content to browsers. Even when you run it from the command line, it usually performs a task and then prints some output. PHP can also be useful, however, playing the role of a web browser — retrieving URLs and then operating on the content. Most recipes in this chapter cover retrieving URLs and processing the results, although there are a few other tasks in here as well, such as using templates and processing server logs.

There are four ways to retrieve a remote URL in PHP. Choosing one method over another depends on your needs for simplicity, control, and portability. The four methods are to use fopen( ) , fsockopen( ), the cURL extension, or the HTTP_Request class from PEAR.

Using fopen( ) is simple and convenient. We discuss it in Recipe 11.2. The fopen( ) function automatically follows redirects, so if you use this function to retrieve the directory http://www.example.com/people and the server redirects you to http://www.example.com/people/, you’ll get the contents of the directory index page, not a message telling you that the URL has moved. The fopen( ) function also works with both HTTP and FTP. The downsides to fopen( ) include: it can handle only HTTP GET requests (not HEAD or POST), you can’t send additional headers or any cookies with the request, and you can retrieve only the response body with it, not response headers.

Using fsockopen( ) requires more work but gives you ...

Get PHP Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.