Cover image for sed & awk, 2nd Edition

Book description

sed & awk describes two text processing programs that are mainstays of the UNIX programmer's toolbox. sed is a "stream editor" for editing streams of text that might be too large to edit as a single file, or that might be generated on the fly as part of a larger data processing step. The most common operation done with sed is substitution, replacing one block of text with another. awk is a complete programming language. Unlike many conventional languages, awk is "data driven" -- you specify what kind of data you are interested in and the operations to be performed when that data is found. awk does many things for you, including automatically opening and closing data files, reading records, breaking the records up into fields, and counting the records. While awk provides the features of most conventional programming languages, it also includes some unconventional features, such as extended regular expression matching and associative arrays. sed & awk describes both programs in detail and includes a chapter of example sed and awk scripts. This edition covers features of sed and awk that are mandated by the POSIX standard. This most notably affects awk, where POSIX standardized a new variable, CONVFMT, and new functions, toupper() and tolower(). The CONVFMT variable specifies the conversion format to use when converting numbers to strings (awk used to use OFMT for this purpose). The toupper() and tolower() functions each take a (presumably mixed case) string argument and return a new version of the string with all letters translated to the corresponding case. In addition, this edition covers GNU sed, newly available since the first edition. It also updates the first edition coverage of Bell Labs nawk and GNU awk (gawk), covers mawk, an additional freely available implementation of awk, and briefly discusses three commercial versions of awk, MKS awk, Thompson Automation awk (tawk), and Videosoft (VSAwk).

Table of Contents

  1. sed & awk, 2nd Edition
  2. A Note Regarding Supplemental Files
  3. Dedication
  4. Preface
    1. Scope of This Handbook
    2. Availability of sed and awk
      1. DOS Versions
      2. Other Sources of Information About sed and awk
      3. Sample Programs
    3. Obtaining Example Source Code
      1. FTP
      2. Ftpmail
      3. BITFTP
      4. UUCP
    4. Conventions Used in This Handbook
    5. About the Second Edition
    6. Acknowledgments from the First Edition
    7. Comments and Questions
  5. 1. Power Tools for Editing
    1. 1.1. May You Solve Interesting Problems
    2. 1.2. A Stream Editor
    3. 1.3. A Pattern-Matching Programming Language
    4. 1.4. Four Hurdles to Mastering sed and awk
  6. 2. Understanding Basic Operations
    1. 2.1. Awk, by Sed and Grep, out of Ed
    2. 2.2. Command-Line Syntax
      1. 2.2.1. Scripting
      2. 2.2.2. Sample Mailing List
    3. 2.3. Using sed
      1. 2.3.1. Specifying Simple Instructions
        1. 2.3.1.1. Command garbled
      2. 2.3.2. Script Files
        1. 2.3.2.1. Saving output
        2. 2.3.2.2. Suppressing automatic display of input lines
        3. 2.3.2.3. Mixing options (POSIX)
        4. 2.3.2.4. Summary of options
    4. 2.4. Using awk
      1. 2.4.1. Running awk
      2. 2.4.2. Error Messages
      3. 2.4.3. Summary of Options
    5. 2.5. Using sed and awk Together
  7. 3. Understanding Regular Expression Syntax
    1. 3.1. That's an Expression
    2. 3.2. A Line-Up of Characters
      1. 3.2.1. The Ubiquitous Backslash
      2. 3.2.2. A Wildcard
      3. 3.2.3. Writing Regular Expressions
      4. 3.2.4. Character Classes
        1. 3.2.4.1. A range of characters
        2. 3.2.4.2. Excluding a class of characters
        3. 3.2.4.3. POSIX character class additions
      5. 3.2.5. Repeated Occurrences of a Character
      6. 3.2.6. What's the Word? Part I
      7. 3.2.7. Positional Metacharacters
        1. 3.2.7.1. Phrases
      8. 3.2.8. A Span of Characters
      9. 3.2.9. Alternative Operations
      10. 3.2.10. Grouping Operations
      11. 3.2.11. What's the Word? Part II
      12. 3.2.12. Your Replacement Is Here
        1. 3.2.12.1. The extent of the match
      13. 3.2.13. Limiting the Extent
    3. 3.3. I Never Metacharacter I Didn't Like
  8. 4. Writing sed Scripts
    1. 4.1. Applying Commands in a Script
      1. 4.1.1. The Pattern Space
    2. 4.2. A Global Perspective on Addressing
      1. 4.2.1. Grouping Commands
    3. 4.3. Testing and Saving Output
      1. 4.3.1. testsed
      2. 4.3.2. runsed
    4. 4.4. Four Types of sed Scripts
      1. 4.4.1. Multiple Edits to the Same File
      2. 4.4.2. Making Changes Across a Set of Files
      3. 4.4.3. Extracting Contents of a File
        1. 4.4.3.1. Extracting a macro definition
        2. 4.4.3.2. Generating an outline
      4. 4.4.4. Edits To Go
    5. 4.5. Getting to the PromiSed Land
  9. 5. Basic sed Commands
    1. 5.1. About the Syntax of sed Commands
    2. 5.2. Comment
    3. 5.3. Substitution
      1. 5.3.1. Replacement Metacharacters
        1. 5.3.1.1. Correcting index entries
    4. 5.4. Delete
    5. 5.5. Append, Insert, and Change
    6. 5.6. List
      1. 5.6.1. Stripping Out Non-Printable Characters from nroff Files
    7. 5.7. Transform
    8. 5.8. Print
    9. 5.9. Print Line Number
    10. 5.10. Next
    11. 5.11. Reading and Writing Files
      1. 5.11.1. Checking Out Reference Pages
    12. 5.12. Quit
  10. 6. Advanced sed Commands
    1. 6.1. Multiline Pattern Space
      1. 6.1.1. Append Next Line
        1. 6.1.1.1. Converting an Interleaf file
      2. 6.1.2. Multiline Delete
      3. 6.1.3. Multiline Print
    2. 6.2. A Case for Study
    3. 6.3. Hold That Line
      1. 6.3.1. A Capital Transformation
      2. 6.3.2. Correcting Index Entries (Part II)
      3. 6.3.3. Building Blocks of Text
    4. 6.4. Advanced Flow Control Commands
      1. 6.4.1. Branching
      2. 6.4.2. The Test Command
      3. 6.4.3. One More Case
    5. 6.5. To Join a Phrase
  11. 7. Writing Scripts for awk
    1. 7.1. Playing the Game
    2. 7.2. Hello, World
    3. 7.3. Awk's Programming Model
    4. 7.4. Pattern Matching
      1. 7.4.1. Describing Your Script
    5. 7.5. Records and Fields
      1. 7.5.1. Referencing and Separating Fields
      2. 7.5.2. Field Splitting: The Full Story
    6. 7.6. Expressions
      1. 7.6.1. Averaging Student Grades
    7. 7.7. System Variables
      1. 7.7.1. Working with Multiline Records
      2. 7.7.2. Balance the Checkbook
    8. 7.8. Relational and Boolean Operators
      1. 7.8.1. Getting Information About Files
    9. 7.9. Formatted Printing
    10. 7.10. Passing Parameters Into a Script
    11. 7.11. Information Retrieval
      1. 7.11.1. Finding a Glitch
  12. 8. Conditionals, Loops, and Arrays
    1. 8.1. Conditional Statements
      1. 8.1.1. Conditional Operator
    2. 8.2. Looping
      1. 8.2.1. While Loop
      2. 8.2.2. Do Loop
      3. 8.2.3. For Loop
      4. 8.2.4. Deriving Factorials
    3. 8.3. Other Statements That Affect Flow Control
    4. 8.4. Arrays
      1. 8.4.1. Associative Arrays
      2. 8.4.2. Testing for Membership in an Array
      3. 8.4.3. A Glossary Lookup Script
      4. 8.4.4. Using split( ) to Create Arrays
      5. 8.4.5. Making Conversions
      6. 8.4.6. Deleting Elements of an Array
    5. 8.5. An Acronym Processor
      1. 8.5.1. Multidimensional Arrays
    6. 8.6. System Variables That Are Arrays
      1. 8.6.1. An Array of Command-Line Parameters
      2. 8.6.2. An Array of Environment Variables
  13. 9. Functions
    1. 9.1. Arithmetic Functions
      1. 9.1.1. Trigonometric Functions
      2. 9.1.2. Integer Function
      3. 9.1.3. Random Number Generation
      4. 9.1.4. Pick 'em
    2. 9.2. String Functions
      1. 9.2.1. Substrings
      2. 9.2.2. String Length
      3. 9.2.3. Substitution Functions
      4. 9.2.4. Converting Case
      5. 9.2.5. The match( ) Function
    3. 9.3. Writing Your Own Functions
      1. 9.3.1. Writing a Sort Function
      2. 9.3.2. Maintaining a Function Library
      3. 9.3.3. Another Sorted Example
  14. 10. The Bottom Drawer
    1. 10.1. The getline Function
      1. 10.1.1. Reading Input from Files
      2. 10.1.2. Assigning the Input to a Variable
      3. 10.1.3. Reading Input from a Pipe
    2. 10.2. The close( ) Function
    3. 10.3. The system( ) Function
    4. 10.4. A Menu-Based Command Generator
    5. 10.5. Directing Output to Files and Pipes
      1. 10.5.1. Directing Output to a Pipe
      2. 10.5.2. Working with Multiple Files
    6. 10.6. Generating Columnar Reports
    7. 10.7. Debugging
      1. 10.7.1. Make a Copy
      2. 10.7.2. Before and After Photos
      3. 10.7.3. Finding Out Where the Problem Is
      4. 10.7.4. Commenting Out Loud
      5. 10.7.5. Slash and Burn
      6. 10.7.6. Getting Defensive About Your Script
    8. 10.8. Limitations
    9. 10.9. Invoking awk Using the #! Syntax
  15. 11. A Flock of awks
    1. 11.1. Original awk
      1. 11.1.1. Escape Sequences
      2. 11.1.2. Exponentiation
      3. 11.1.3. The C Conditional Expression
      4. 11.1.4. Variables as Boolean Patterns
      5. 11.1.5. Faking Dynamic Regular Expressions
      6. 11.1.6. Control Flow
      7. 11.1.7. Field Separating
      8. 11.1.8. Arrays
      9. 11.1.9. The getline Function
      10. 11.1.10. Functions
      11. 11.1.11. Built-In Variables
    2. 11.2. Freely Available awks
      1. 11.2.1. Common Extensions
        1. 11.2.1.1. Deleting all elements of an array
        2. 11.2.1.2. Obtaining individual characters
        3. 11.2.1.3. Flushing buffered output
        4. 11.2.1.4. Special filenames
        5. 11.2.1.5. The nextfile statement
        6. 11.2.1.6. Regular expression record separators (gawk and mawk)
      2. 11.2.2. Bell Labs awk
      3. 11.2.3. GNU awk (gawk)
        1. 11.2.3.1. Command line options
        2. 11.2.3.2. An awk program search path
        3. 11.2.3.3. Line continuation
        4. 11.2.3.4. Extended regular expressions
        5. 11.2.3.5. Regular expression record terminators
        6. 11.2.3.6. Separating fields
        7. 11.2.3.7. Additional special files
        8. 11.2.3.8. Additional variables
        9. 11.2.3.9. Additional functions
        10. 11.2.3.10. A general substitution function
        11. 11.2.3.11. Time management for programmers
      4. 11.2.4. Michael's awk (mawk)
    3. 11.3. Commercial awks
      1. 11.3.1. MKS awk
      2. 11.3.2. Thompson Automation awk (tawk)
        1. 11.3.2.1. Tawk language extensions
        2. 11.3.2.2. Additional built-in tawk functions
      3. 11.3.3. Videosoft VSAwk
    4. 11.4. Epilogue
  16. 12. Full-Featured Applications
    1. 12.1. An Interactive Spelling Checker
      1. 12.1.1. BEGIN Procedure
      2. 12.1.2. Main Procedure
      3. 12.1.3. END Procedure
      4. 12.1.4. Supporting Functions
      5. 12.1.5. The spellcheck Shell Script
    2. 12.2. Generating a Formatted Index
      1. 12.2.1. The masterindex Program
      2. 12.2.2. Standardizing Input
      3. 12.2.3. Sorting the Entries
      4. 12.2.4. Handling Page Numbers
      5. 12.2.5. Merging Entries with the Same Keys
      6. 12.2.6. Formatting the Index
        1. 12.2.6.1. The masterindex shell script
    3. 12.3. Spare Details of the masterindex Program
      1. 12.3.1. How to Hide a Special Character
      2. 12.3.2. Rotating Two Parts
      3. 12.3.3. Finding a Replacement
      4. 12.3.4. A Function for Reporting Errors
      5. 12.3.5. Handling See Also Entries
      6. 12.3.6. Alternative Ways to Sort
  17. 13. A Miscellany of Scripts
    1. 13.1. uutot.awk—Report UUCP Statistics
      1. 13.1.1. Program Notes for uutot.awk
    2. 13.2. phonebill—Track Phone Usage
      1. 13.2.1. Program Notes for phonebill
    3. 13.3. combine—Extract Multipart uuencoded Binaries
      1. 13.3.1. Program Notes for combine
    4. 13.4. mailavg—Check Size of Mailboxes
      1. 13.4.1. Program Notes for mailavg
    5. 13.5. adj—Adjust Lines for Text Files
      1. 13.5.1. Program Notes for adj
    6. 13.6. readsource—Format Program Source Files for troff
      1. 13.6.1. Program Notes for readsource
    7. 13.7. gent—Get a termcap Entry
      1. 13.7.1. Program Notes for gent
    8. 13.8. plpr—lpr Preprocessor
      1. 13.8.1. Program Notes for plpr
    9. 13.9. transpose—Perform a Matrix Transposition
      1. 13.9.1. Program Notes for transpose
    10. 13.10. m1—Simple Macro Processor
      1. 13.10.1. Program Notes for m1
  18. A. Quick Reference for sed
    1. A.1. Command-Line Syntax
    2. A.2. Syntax of sed Commands
      1. A.2.1. Pattern Addressing
      2. A.2.2. Regular Expression Metacharacters for sed
    3. A.3. Command Summary for sed
  19. B. Quick Reference for awk
    1. B.1. Command-Line Syntax
      1. B.1.1. Shell Wrapper for Invoking awk
    2. B.2. Language Summary for awk
      1. B.2.1. Records and Fields
      2. B.2.2. Format of a Script
        1. B.2.2.1. Line termination
        2. B.2.2.2. Comments
      3. B.2.3. Patterns
      4. B.2.4. Regular Expressions
      5. B.2.5. Expressions
        1. B.2.5.1. Constants
        2. B.2.5.2. Escape sequences
        3. B.2.5.3. Variables
        4. B.2.5.4. Arrays
        5. B.2.5.5. System variables
        6. B.2.5.6. Operators
      6. B.2.6. Statements and Functions
    3. B.3. Command Summary for awk
      1. B.3.1. Format Expressions Used in printf and sprintf
  20. C. Supplement for Chapter 12
    1. C.1. Full Listing of spellcheck.awk
    2. C.2. Listing of masterindex Shell Script
    3. C.3. Documentation for masterindex
      1. masterindex
    4. C.3.1. Background Details
    5. C.3.2. Coding Index Entries
    6. C.3.3. Output Format
    7. C.3.4. Compiling a Master Index
  21. Index
  22. About the Authors
  23. Colophon
  24. Copyright