Text Processing with Ruby

Book description

Text is everywhere. Web pages, databases, the contents of files--for almost any programming task you perform, you need to process text. Cut even the most complex text-based tasks down to size and learn how to master regular expressions, scrape information from Web pages, develop reusable utilities to process text in pipelines, and more.

Publisher resources

View/Submit Errata

Table of contents

  1. Text Processing with Ruby
    1. For the Best Reading Experience...
    2. Table of Contents
    3. Early praise for Text Processing with Rub y
    4. Acknowledgments
    5. Introduction
      1. About This Book
      2. Online Resources
    6. Pa rt 1 Extract: Acquiring Text
      1. Chapter 1: Reading from Files
        1. Opening a File
        2. Reading from a File
        3. Treating Files as Streams
        4. Reading Fixed-Width Files
        5. Wrapping Up
      2. Chapter 2: Processing Standard Input
        1. Redirecting Input from Other Processes
        2. Example: Extracting URLs
        3. Concurrency and Buffering
        4. Wrapping Up
      3. Chapter 3: Shell One-Liners
        1. Arguments to the Ruby Interpreter
        2. Prepending and Appending Code
        3. Example: Parsing Log Files
        4. Wrapping Up
      4. Chapter 4: Flexible Filters with ARGF
        1. Reading from ARGF as a Stream
        2. Modifying Files
        3. Manipulating ARGV
        4. Wrapping Up
      5. Chapter 5: Delimited Data
        1. Parsing a TSV
        2. Delimited Data and the Command Line
        3. The CSV Format
        4. Wrapping Up
      6. Chapter 6: Scraping HTML
        1. The Right Tool for the Job: Nokogiri
        2. Searching the Document
        3. Working with Elements
        4. Exploring a Page
        5. Example: Reading a League Table
        6. Wrapping Up
      7. Chapter 7: Encodings
        1. A Brief Introduction to Character Encodings
        2. Ruby’s Support for Character Encodings
        3. Detecting Encodings
        4. Wrapping Up
    7. Part 2: Transform: Modifying and Manipulating Text
      1. Chapter 8: Regular Expressions Basics
        1. A Gentle Introduction
        2. Pattern Syntax
        3. Regular Expressions in Ruby
        4. Wrapping Up
      2. Chapter 9: Extraction and Substitution with Regular Expressions
        1. Matching Against Patterns
        2. Global Match Variables
        3. Extracting Multiple Matches
        4. Transforming Text
        5. Wrapping Up
      3. Chapter 10: Writing Parsers
        1. Simple Parsers with StringScanner
        2. Example: Parsing a Config File
        3. Rule-Based Parsers
        4. Example: Parsing RTF Files
        5. Wrapping Up
      4. Chapter 11: Natural Language Processing
        1. What Is Natural Language Processing?
        2. Example: Extracting Keywords from Articles
        3. Example: Fuzzy Searching
        4. Wrapping Up
    8. Part 3: Load: Writing Text
      1. Chapter 12: Standard Output and Standard Error
        1. Simple Output
        2. Formatting Output with printf
        3. Redirecting Standard Output
        4. Wrapping Up
      2. Chapter 13: Writing to Other Processes and to Files
        1. Writing to Other Processes
        2. Writing to Files
        3. Temporary Files
        4. Wrapping Up
      3. Chapter 14: Serialization and Structure: JSON, XML, CSV
        1. JSON
        2. XML
        3. CSV
        4. Wrapping Up
      4. Chapter 15: Templating Output with ERB
        1. Writing Templates
        2. Example: Generating a Purchase Ledger
        3. Evaluating Templates
        4. Passing Data to Templates
        5. Controlling Presentation with Decorators
        6. Wrapping Up
    9. Part 4: Appendices
      1. Appendix 1: A Shell Primer
        1. Running Commands
        2. Controlling Output
        3. Exit Statuses and Flow Control
      2. Appendix 2: Useful Shell Commands
    10. You May Be Interested I n…

Product information

  • Title: Text Processing with Ruby
  • Author(s): Rob Miller
  • Release date: September 2015
  • Publisher(s): Pragmatic Bookshelf
  • ISBN: 9781680500707