Cover image for Effective awk Programming, 3rd Edition

Book description

Effective awk Programming,3rd Edition, focuses entirely on awk, exploring it in the greatest depth of the three awk titles we carry. It's an excellent companion piece to the more broadly focused second edition. This book provides complete coverage of the gawk 3.1 language as well as the most up-to-date coverage of the POSIX standard for awk available anywhere. Author Arnold Robbins clearly distinguishes standard awk features from GNU awk(gawk)-specific features, shines light into many of the "dark corners" of the language (areas to watch out for when programming), and devotes two full chapters to example programs. A brand new chapter is devoted to TCP/IP networking with gawk. He includes a summary of how the awk language evolved. The book also covers:

  • Internationalization of gawk

  • Interfacing to i18n at the awk level

  • Two-way pipes

  • TCP/IP networking via the two-way pipe interface

  • The new PROCINFO array, which provides information about running gawk

  • Profiling and pretty-printing awk programs

In addition to covering the awk language, this book serves as the official "User's Guide" for the GNU implementation of awk (gawk), describing in an integrated fashion the extensions available to the System V Release 4 version of awk that are also available in gawk. As the official gawk User's Guide, this book will also be available electronically, and can be freely copied and distributed under the terms of the Free Software Foundation's Free Documentation License (FDL). A portion of the proceeds from sales of this book will go to the Free Software Foundation to support further development of free and open source software. The third edition of Effective awk Programming is a GNU Manual and is published by O'Reilly & Associates under the Free Software Foundation'sFree Documentation License (FDL). A portion of the proceeds from the sale of this book is donated to the Free Software Foundation to further development of GNU software. This book is also available in electronic form; you have the freedom to modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development.

Table of Contents

  1. Special Upgrade Offer
  2. A Note Regarding Supplemental Files
  3. Dedication
  4. Foreword
  5. Preface
    1. History of awk and gawk
    2. A Rose by Any Other Name
    3. Using This Book
    4. Typographical Conventions
      1. Dark Corners
    5. The GNU Project and This Book
    6. How to Contribute
    7. Acknowledgments
  6. I. The awk Language and gawk
    1. 1. Getting Started with awk
      1. 1.1. How to Run awk Programs
        1. 1.1.1. One-Shot Throwaway awk Programs
        2. 1.1.2. Running awk Without Input Files
        3. 1.1.3. Running Long Programs
        4. 1.1.4. Executable awk Programs
        5. 1.1.5. Comments in awk Programs
        6. 1.1.6. Shell-Quoting Issues
      2. 1.2. Datafiles for the Examples
      3. 1.3. Some Simple Examples
      4. 1.4. An Example with Two Rules
      5. 1.5. A More Complex Example
      6. 1.6. awk Statements Versus Lines
      7. 1.7. Other Features of awk
      8. 1.8. When to Use awk
    2. 2. Regular Expressions
      1. 2.1. How to Use Regular Expressions
      2. 2.2. Escape Sequences
      3. 2.3. Regular Expression Operators
      4. 2.4. Using Character Lists
      5. 2.5. gawk-Specific Regexp Operators
      6. 2.6. Case Sensitivity in Matching
      7. 2.7. How Much Text Matches?
      8. 2.8. Using Dynamic Regexps
    3. 3. Reading Input Files
      1. 3.1. How Input Is Split into Records
      2. 3.2. Examining Fields
      3. 3.3. Non-constant Field Numbers
      4. 3.4. Changing the Contents of a Field
      5. 3.5. Specifying How Fields Are Separated
        1. 3.5.1. Using Regular Expressions to Separate Fields
        2. 3.5.2. Making Each Character a Separate Field
        3. 3.5.3. Setting FS from the Command Line
        4. 3.5.4. Field-Splitting Summary
      6. 3.6. Reading Fixed-Width Data
      7. 3.7. Multiple-Line Records
      8. 3.8. Explicit Input with getline
        1. 3.8.1. Using getline with No Arguments
        2. 3.8.2. Using getline into a Variable
        3. 3.8.3. Using getline from a File
        4. 3.8.4. Using getline into a Variable from a File
        5. 3.8.5. Using getline from a Pipe
        6. 3.8.6. Using getline into a Variable from a Pipe
        7. 3.8.7. Using getline from a Coprocess
        8. 3.8.8. Using getline into a Variable from a Coprocess
        9. 3.8.9. Points to Remember About getline
        10. 3.8.10. Summary of getline Variants
    4. 4. Printing Output
      1. 4.1. The print Statement
      2. 4.2. Examples of print Statements
      3. 4.3. Output Separators
      4. 4.4. Controlling Numeric Output with print
      5. 4.5. Using printf Statements for Fancier Printing
        1. 4.5.1. Introduction to the printf Statement
        2. 4.5.2. Format-Control Letters
        3. 4.5.3. Modifiers for printf Formats
        4. 4.5.4. Examples Using printf
      6. 4.6. Redirecting Output of print and printf
      7. 4.7. Special Filenames in gawk
        1. 4.7.1. Special Files for Standard Descriptors
        2. 4.7.2. Special Files for Process-Related Information
        3. 4.7.3. Special Files for Network Communications
        4. 4.7.4. Special Filename Caveats
      8. 4.8. Closing Input and Output Redirections
    5. 5. Expressions
      1. 5.1. Constant Expressions
        1. 5.1.1. Numeric and String Constants
        2. 5.1.2. Octal and Hexadecimal Numbers
        3. 5.1.3. Regular Expression Constants
      2. 5.2. Using Regular Expression Constants
      3. 5.3. Variables
        1. 5.3.1. Using Variables in a Program
        2. 5.3.2. Assigning Variables on the Command Line
      4. 5.4. Conversion of Strings and Numbers
      5. 5.5. Arithmetic Operators
      6. 5.6. String Concatenation
      7. 5.7. Assignment Expressions
      8. 5.8. Increment and Decrement Operators
      9. 5.9. True and False in awk
      10. 5.10. Variable Typing and Comparison Expressions
      11. 5.11. Boolean Expressions
      12. 5.12. Conditional Expressions
      13. 5.13. Function Calls
      14. 5.14. Operator Precedence (How Operators Nest)
    6. 6. Patterns, Actions, and Variables
      1. 6.1. Pattern Elements
        1. 6.1.1. Regular Expressions as Patterns
        2. 6.1.2. Expressions as Patterns
        3. 6.1.3. Specifying Record Ranges with Patterns
        4. 6.1.4. The BEGIN and END Special Patterns
          1. 6.1.4.1. Startup and cleanup actions
          2. 6.1.4.2. Input/Output from BEGIN and END rules
        5. 6.1.5. The Empty Pattern
      2. 6.2. Using Shell Variables in Programs
      3. 6.3. Actions
      4. 6.4. Control Statements in Actions
        1. 6.4.1. The if-else Statement
        2. 6.4.2. The while Statement
        3. 6.4.3. The do-while Statement
        4. 6.4.4. The for Statement
        5. 6.4.5. The break Statement
        6. 6.4.6. The continue Statement
        7. 6.4.7. The next Statement
        8. 6.4.8. Using gawk’s nextfile Statement
        9. 6.4.9. The exit Statement
      5. 6.5. Built-in Variables
        1. 6.5.1. Built-in Variables That Control awk
        2. 6.5.2. Built-in Variables That Convey Information
        3. 6.5.3. Using ARGC and ARGV
    7. 7. Arrays in awk
      1. 7.1. Introduction to Arrays
      2. 7.2. Referring to an Array Element
      3. 7.3. Assigning Array Elements
      4. 7.4. Basic Array Example
      5. 7.5. Scanning All Elements of an Array
      6. 7.6. The delete Statement
      7. 7.7. Using Numbers to Subscript Arrays
      8. 7.8. Using Uninitialized Variables as Subscripts
      9. 7.9. Multidimensional Arrays
      10. 7.10. Scanning Multidimensional Arrays
      11. 7.11. Sorting Array Values and Indices with gawk
    8. 8. Functions
      1. 8.1. Built-in Functions
        1. 8.1.1. Calling Built-in Functions
        2. 8.1.2. Numeric Functions
        3. 8.1.3. String-Manipulation Functions
          1. 8.1.3.1. More about \ and & with sub, gsub, and gensub
        4. 8.1.4. Input/Output Functions
        5. 8.1.5. Using gawk’s Timestamp Functions
        6. 8.1.6. Bit-Manipulation Functions of gawk
        7. 8.1.7. Using gawk’s String-Translation Functions
      2. 8.2. User-Defined Functions
        1. 8.2.1. Function Definition Syntax
        2. 8.2.2. Function Definition Examples
        3. 8.2.3. Calling User-Defined Functions
        4. 8.2.4. The return Statement
        5. 8.2.5. Functions and Their Effects on Variable Typing
    9. 9. Internationalization with gawk
      1. 9.1. Internationalization and Localization
      2. 9.2. GNU gettext
      3. 9.3. Internationalizing awk Programs
      4. 9.4. Translating awk Programs
        1. 9.4.1. Extracting Marked Strings
        2. 9.4.2. Rearranging printf Arguments
        3. 9.4.3. awk Portability Issues
      5. 9.5. A Simple Internationalization Example
      6. 9.6. gawk Can Speak Your Language
    10. 10. Advanced Features of gawk
      1. 10.1. Allowing Nondecimal Input Data
      2. 10.2. Two-Way Communications with Another Process
      3. 10.3. Using gawk for Network Programming
      4. 10.4. Using gawk with BSD Portals
      5. 10.5. Profiling Your awk Programs
    11. 11. Running awk and gawk
      1. 11.1. Invoking awk
      2. 11.2. Command-Line Options
      3. 11.3. Other Command-Line Arguments
      4. 11.4. The AWKPATH Environment Variable
      5. 11.5. Obsolete Options and/or Features
      6. 11.6. Known Bugs in gawk
  7. II. Using awk and gawk
    1. 12. A Library of awk Functions
      1. 12.1. Naming Library Function Global Variables
      2. 12.2. General Programming
        1. 12.2.1. Implementing nextfile as a Function
        2. 12.2.2. Assertions
        3. 12.2.3. Rounding Numbers
        4. 12.2.4. The Cliff Random Number Generator
        5. 12.2.5. Translating Between Characters and Numbers
        6. 12.2.6. Merging an Array into a String
        7. 12.2.7. Managing the Time of Day
      3. 12.3. Datafile Management
        1. 12.3.1. Noting Datafile Boundaries
        2. 12.3.2. Rereading the Current File
        3. 12.3.3. Checking for Readable Datafiles
        4. 12.3.4. Treating Assignments as Filenames
      4. 12.4. Processing Command-Line Options
      5. 12.5. Reading the User Database
      6. 12.6. Reading the Group Database
    2. 13. Practical awk Programs
      1. 13.1. Running the Example Programs
      2. 13.2. Reinventing Wheels for Fun and Profit
        1. 13.2.1. Cutting out Fields and Columns
        2. 13.2.2. Searching for Regular Expressions in Files
        3. 13.2.3. Printing out User Information
        4. 13.2.4. Splitting a Large File into Pieces
        5. 13.2.5. Duplicating Output into Multiple Files
        6. 13.2.6. Printing Nonduplicated Lines of Text
        7. 13.2.7. Counting Things
      3. 13.3. A Grab Bag of awk Programs
        1. 13.3.1. Finding Duplicated Words in a Document
        2. 13.3.2. An Alarm Clock Program
        3. 13.3.3. Transliterating Characters
        4. 13.3.4. Printing Mailing Labels
        5. 13.3.5. Generating Word-Usage Counts
        6. 13.3.6. Removing Duplicates from Unsorted Text
        7. 13.3.7. Extracting Programs from Texinfo Source Files
        8. 13.3.8. A Simple Stream Editor
        9. 13.3.9. An Easy Way to Use Library Functions
    3. 14. Internetworking with gawk
      1. 14.1. Networking with gawk
        1. 14.1.1. gawk’s Networking Mechanisms
          1. 14.1.1.1. The fields of the special filename
          2. 14.1.1.2. Comparing protocols
          3. 14.1.1.3. /inet/tcp
          4. 14.1.1.4. /inet/udp
          5. 14.1.1.5. /inet/raw
        2. 14.1.2. Establishing a TCP Connection
        3. 14.1.3. Troubleshooting Connection Problems
        4. 14.1.4. Interacting with a Network Service
        5. 14.1.5. Setting up a Service
        6. 14.1.6. Reading Email
        7. 14.1.7. Reading a Web Page
        8. 14.1.8. A Primitive Web Service
        9. 14.1.9. A Web Service with Interaction
          1. 14.1.9.1. A Simple CGI Library
        10. 14.1.10. A Simple Web Server
        11. 14.1.11. Network Programming Caveats
      2. 14.2. Some Applications and Techniques
        1. 14.2.1. PANIC: An Emergency Web Server
        2. 14.2.2. GETURL: Retrieving Web Pages
        3. 14.2.3. REMCONF: Remote Configuration of Embedded Systems
        4. 14.2.4. URLCHK: Look for Changed Web Pages
        5. 14.2.5. WEBGRAB: Extract Links from a Page
        6. 14.2.6. STATIST: Graphing a Statistical Distribution
        7. 14.2.7. MOBAGWHO: A Simple Mobile Agent
      3. 14.3. Related Links
  8. III. Appendixes
    1. A. The Evolution of the awk Language
      1. A.1. Major Changes Between V7 and SVR3.1
      2. A.2. Changes Between SVR3.1 and SVR4
      3. A.3. Changes Between SVR4 and POSIX awk
      4. A.4. Extensions in the Bell Laboratories awk
      5. A.5. Extensions in gawk Not in POSIX awk
      6. A.6. Major Contributors to gawk
    2. B. Installing gawk
      1. B.1. The gawk Distribution
        1. B.1.1. Getting the gawk Distribution
        2. B.1.2. Extracting the Distribution
        3. B.1.3. Contents of the gawk Distribution
      2. B.2. Compiling and Installing gawk on Unix
        1. B.2.1. Compiling gawk for Unix
        2. B.2.2. Additional Configuration Options
        3. B.2.3. The Configuration Process
      3. B.3. Installation on PC Operating Systems
        1. B.3.1. Installing a Prepared Distribution for PC Systems
        2. B.3.2. Compiling gawk for PC Operating Systems
        3. B.3.3. Using gawk on PC Operating Systems
      4. B.4. Reporting Problems and Bugs
      5. B.5. Other Freely Available awk Implementations
    3. C. Implementation Notes
      1. C.1. Downward Compatibility and Debugging
      2. C.2. Making Additions to gawk
        1. C.2.1. Adding New Features
        2. C.2.2. Porting gawk to a New Operating System
      3. C.3. Adding New Built-in Functions to gawk
        1. C.3.1. A Minimal Introduction to gawk Internals
        2. C.3.2. Directory and File Operation Built-ins
          1. C.3.2.1. Using chdir and stat
          2. C.3.2.2. C code for chdir and stat
          3. C.3.2.3. Integrating the extensions
      4. C.4. Probable Future Extensions
    4. D. Basic Programming Concepts
      1. D.1. What a Program Does
      2. D.2. Data Values in a Computer
      3. D.3. Floating-Point Number Caveats
    5. E. GNU General Public License
      1. E.1. Preamble
      2. E.2. Terms and Conditions for Copying, Distribution, and Modification
      3. E.3. NO WARRANTY
      4. E.4. END OF TERMS AND CONDITIONS
        1. E.4.1. How to Apply These Terms to Your New Programs
    6. F. GNU Free Documentation License
      1. F.1. ADDENDUM: How to Use This License for Your Documents
  9. Glossary
  10. Index
  11. About the Author
  12. Colophon
  13. Special Upgrade Offer
  14. Copyright