Cover image for Perl for Web Site Management

Book description

Checking links, batch editing HTML files, tracking users, and writing CGI scripts--these are the often tedious daily tasks that can be done much more easily with Perl, the scripting language that runs on almost all computing platforms. If you're more interested in streamlining your web activities than in learning a new programming language, Perl for Web Site Management is for you: it's not so much about learning Perl as it is about using Perl to do common web chores more efficiently. The secret is that, although becoming a Perl expert may be hard, most Perl scripts are relatively simple. Using Perl and other open source tools, you'll learn how to:

  • Incorporate a simple search engine

  • Write a simple CGI gateway

  • Convert multiple text files into HTML

  • Monitor log files

  • Track users as they navigate your site

Even if you don't have any programming background, this book will get you quickly past Perl's seemingly forbidding barrier of chops and chomps, execs and elsifs. You'll be able to put an end to using clunky tools, editing files tediously by hand, or relying on programmers and system administrators to do "the hard stuff" for you. Sure, you might learn a little bit about programming as well, and perhaps something about the role of open source tools on the Web. But the purpose of Perl for Web Site Management isn't to educate you--it's to empower you. Whether you're a developer, a designer, or simply a dabbler on the Web, this book is the plain-English, hands-on introduction to Perl you've been waiting for.

Table of Contents

  1. Perl for Web Site Management
    1. Preface
      1. Intended Audience
      2. Programmers by Accident
      3. What This Book Offers
      4. Organization
      5. Online Examples
      6. Conventions Used in This Book
      7. How to Contact Us
      8. Acknowledgments
    2. 1. Getting Your Tools in Order
      1. Open Source Versus Proprietary Software
      2. Evaluating a Hosting Provider
      3. Web Hosting Alternatives
        1. Free Hosting
        2. Shared Hosting (Low Grade)
        3. Shared Hosting (High Grade)
        4. Dedicated Hosting/Co-Location
      4. Getting Started with SSH/Telnet
      5. Meet the Unix Shell
        1. man, more, and less
        2. Directories and the pwd Command
        3. The ls Command: List Directory Contents
        4. The mkdir Command: Make a New Directory
        5. The cd Command: Change Directories
        6. CTRL-C (^C): Cancel a Command in Progress
        7. The exit Command: End Your Shell Session
      6. Network Troubleshooting
        1. ping and traceroute
        2. mtr
      7. A Suitable Text Editor
    3. 2. Getting Started with Perl
      1. Finding Perl on Your System
      2. Creating the “Hello, world!” Script
      3. The Dot Slash Thing
      4. Unix File Permissions
      5. Running (and Debugging) the Script
        1. The Joy of Debugging
      6. Perl Documentation
        1. man perl, perldoc perl
        2. Function Documentation with perldoc -f
        3. The Perl FAQ
      7. Perl Variables
        1. Scalar Variables
        2. Array Variables
        3. Hash Variables
      8. A Bit More About Quoting
      9. “Hello, world!” as a CGI Script
        1. Content-Type Headers
        2. Here-Document Quoting
        3. File Locations/Extensions for Running CGI Scripts
        4. Testing from the Command Line
        5. Testing from the Web Server
        6. CGI Script File Permissions
    4. 3. Running a Form-to-Email Gateway
      1. Checking for CGI.pm
      2. Creating the HTML Form
      3. The <FORM> Tag’s ACTION Attribute
      4. The mail_form.cgi Script
      5. Warnings via Perl’s -w Switch
      6. The Configuration Section
      7. Invoking CGI.pm
      8. foreach Loops
      9. if Statements
      10. Filehandles and Piped Output
      11. die Statements
      12. Outputting the Message
      13. Testing the Script
    5. 4. Power Editing with Perl
      1. Being Careful
      2. Renaming Files
        1. Globbing
        2. A Simple Renaming Script
        3. Sanity Checking
        4. Regular Expressions
        5. Running the Renaming Script
      3. Modifying HREF Attributes
        1. First Version of the fix_links.plx Script
        2. Reading from a File with a while Loop
        3. Modifying Data with a Substitution Operator
      4. Writing the Modified Files Back to Disk
    6. 5. Parsing Text Files
      1. The “Dirty Data” Problem
      2. Required Features
      3. Obtaining the Data
      4. Parsing the Data
        1. Using strict and Scoping Variables
        2. Using the Default Variable $_
        3. The push Function
        4. Managing Complexity
        5. Subroutines
        6. The &parse_exhibitor Subroutine
      5. Outputting Sample Data
      6. Making the Script Smarter
      7. Parsing the Category File
      8. Testing the Script Again
    7. 6. Generating HTML
      1. The Modified make_exhibit.plx Script
      2. Changes to &parse_exhibitor
      3. Adding Categories to the Company Listings
      4. Creating Directories
      5. Generating the HTML Pages
        1. Generating the Individual Company Listings
        2. Generating the Alphabetical Index
        3. Using an Explicit Sort Block
        4. Generating the Category Pages
      6. Generating the Top-level Page
    8. 7. Regular Expressions Demystified
      1. Delimiters
      2. Trailing Modifiers
      3. The Search Pattern
      4. Taking It for a Spin
      5. Thinking Like a Computer
        1. Bumping Along and Backtracking
        2. Alternation
    9. 8. Parsing Web Access Logs
      1. Log File Structure
      2. Converting IP Addresses
      3. The Log-Analysis Script
        1. The Mammoth Regular Expression
      4. Different Log File Formats
      5. Storing the Data
      6. The “Visit” Data Structure
        1. The &store_line Subroutine
    10. 9. Date Arithmetic
      1. Date/Time Conversions
      2. Using the Time::Local Module
      3. Caching Date Conversions
      4. Scoping via Anonymous Blocks
      5. Using a BEGIN Block
    11. 10. Generating a Web Access Report
      1. The &new_visit and &add_to_visit Subroutines
      2. Generating the Report
        1. Generating the Summary Line
        2. Saving Previous Summary Lines
      3. Showing the Details of Each Visit
      4. Reporting the Most Popular Pages
      5. Fancier Sorting
        1. Reporting the Referral and User Agent Information
        2. Tracking Robots
      6. Mailing the Report
      7. Using cron
    12. 11. Link Checking
      1. Maintaining Links
      2. Finding Files with File::Find
        1. The Magic of References
        2. Finding HTML Files Only
      3. Looking for Links
      4. Extracting
        1. Converting
      5. Putting It All Together
        1. Creating a Hash of Arrays
        2. Updating &process to Store Bad-Link Data
        3. Printing the Bad-Link Report
        4. Adding HTML Output
      6. Using CPAN
        1. Checking for LWP
        2. Installing LWP from CPAN
          1. Getting the archive file onto the web server
          2. Decompressing the file
          3. Extracting the files from the archive
          4. The actual installation
        3. Root Versus Regular User Installation
      7. Checking Remote Links
      8. A Proper Link Checker
        1. Object-Oriented Syntax
        2. Checking Remote URLs
        3. Processing the Queue
    13. 12. Running a CGI Guestbook
      1. The Guestbook Script
      2. Taint Mode
      3. Guestbook Preliminaries
      4. Untainting with Backreferences
      5. File Locking
      6. Guestbook File Permissions
    14. 13. Running a CGI Search Tool
      1. Downloading and Compiling SWISH-E
      2. Indexing with SWISH-E
      3. Running SWISH-E from the Command Line
      4. Running SWISH-E via a CGI Script
    15. 14. Using HTML Templates
      1. Using Templates
      2. Reading Fillings Back In
      3. Rewriting an Entire Site
    16. 15. Generating Links
      1. The Docbase Concept
      2. The CyberFair Site’s Architecture
      3. The Script’s Data Structure
      4. Using Data::Dumper
      5. Creating Anonymous Hashes and Arrays
      6. Automatically Generating Links
      7. Inserting the Links
    17. 16. Writing Perl Modules
      1. A Simple Module Template
      2. Installing the Module
      3. The Cyberfair::Page Module
    18. 17. Adding Pages via CGI Script
      1. Why Add Pages with a CGI Script?
      2. A Script for Creating HTML Documents
      3. Controlling a Multistage CGI Script
      4. Using Parameterized Links
      5. Building a Form
      6. Posting Pages from the CGI Script
      7. Running External Commands with system and Backticks
      8. Race Conditions
      9. File Locking
      10. Adding Link Checking
    19. 18. Monitoring Search Engine Positioning
      1. Installing WWW::Search
      2. A Single-Search Results Tool
        1. Using the Getopt::Std Module
        2. Using || for Short-Circuit Assignment
      3. A Multisearch Results Tool
      4. The map Function
    20. 19. Keeping Track of Users
      1. Stateless Transactions
      2. Identifying Individual Users
      3. Basic Authentication
        1. The .htaccess File
        2. The .htgroup and .htpasswd Files
      4. Automating User Registration
      5. Storing Data on the Server
        1. Flat Text Files for Data Storage
        2. Serializing Data
        3. Updating the .htpasswd and .htgroup Files
      6. The Register Script
        1. Fixing a Race Condition
        2. Generating a Random Verification String
        3. Array and Hash Slices
        4. The rand Function
      7. The Verification Script
    21. 20. Storing Data in DBM Files
      1. Data Storage Options
      2. The tie Function
      3. A DBM Example Script
      4. Blocking Versus Nonblocking Behavior
      5. Storing Multilevel Data in DBM Files
      6. An MLDBM-Using Registration Script
      7. An MLDBM-Using Verification Script
    22. 21. Where to Go Next
      1. Unix System Administration
      2. Programming
        1. A Programmer’s Editor
        2. Revision-Control Systems
        3. More Perl
        4. JavaScript
        5. PHP
        6. Embperl
        7. Python
        8. Other Languages
      3. Apache Server Administration and mod_perl
      4. Relational Databases
      5. Advocacy
    23. Index
    24. Colophon