You are previewing Beautiful Code.

Beautiful Code

Cover of Beautiful Code by Greg Wilson... Published by O'Reilly Media, Inc.
  1. Beautiful Code
    1. SPECIAL OFFER: Upgrade this ebook with O’Reilly
    2. A Note Regarding Supplemental Files
    3. Foreword
    4. Preface
      1. How This Book Is Organized
      2. Conventions Used in This Book
      3. Using Code Examples
      4. How to Contact Us
      5. Safari® Enabled
    5. 1. A Regular Expression Matcher
      1. 1.1. The Practice of Programming
      2. 1.2. Implementation
      3. 1.3. Discussion
      4. 1.4. Alternatives
      5. 1.5. Building on It
      6. 1.6. Conclusion
    6. 2. Subversion's Delta Editor: Interface As Ontology
      1. 2.1. Version Control and Tree Transformation
      2. 2.2. Expressing Tree Differences
      3. 2.3. The Delta Editor Interface
      4. 2.4. But Is It Art?
      5. 2.5. Abstraction As a Spectator Sport
      6. 2.6. Conclusions
    7. 3. The Most Beautiful Code I Never Wrote
      1. 3.1. The Most Beautiful Code I Ever Wrote
      2. 3.2. More and More with Less and Less
      3. 3.3. Perspective
      4. 3.4. What Is Writing?
      5. 3.5. Conclusion
      6. 3.6. Acknowledgments
    8. 4. Finding Things
      1. 4.1. On Time
      2. 4.2. Problem: Weblog Data
      3. 4.3. Problem: Who Fetched What, When?
      4. 4.4. Search in the Large
      5. 4.5. Conclusion
    9. 5. Correct, Beautiful, Fast (in That Order): Lessons from Designing XML Verifiers
      1. 5.1. The Role of XML Validation
      2. 5.2. The Problem
      3. 5.3. Version 1: The Naïve Implementation
      4. 5.4. Version 2: Imitating the BNF Grammar O(N)
      5. 5.5. Version 3: First Optimization O(log N)
      6. 5.6. Version 4: Second Optimization: Don't Check Twice
      7. 5.7. Version 5: Third Optimization O(1)
      8. 5.8. Version 6: Fourth Optimization: Caching
      9. 5.9. The Moral of the Story
    10. 6. Framework for Integrated Test: Beauty Through Fragility
      1. 6.1. An Acceptance Testing Framework in Three Classes
      2. 6.2. The Challenge of Framework Design
      3. 6.3. An Open Framework
      4. 6.4. How Simple Can an HTML Parser Be?
      5. 6.5. Conclusion
    11. 7. Beautiful Tests
      1. 7.1. That Pesky Binary Search
      2. 7.2. Introducing JUnit
      3. 7.3. Nailing Binary Search
      4. 7.4. Conclusion
    12. 8. On-the-Fly Code Generation for Image Processing
    13. 9. Top Down Operator Precedence
      1. 9.1. JavaScript
      2. 9.2. Symbol Table
      3. 9.3. Tokens
      4. 9.4. Precedence
      5. 9.5. Expressions
      6. 9.6. Infix Operators
      7. 9.7. Prefix Operators
      8. 9.8. Assignment Operators
      9. 9.9. Constants
      10. 9.10. Scope
      11. 9.11. Statements
      12. 9.12. Functions
      13. 9.13. Array and Object Literals
      14. 9.14. Things to Do and Think About
    14. 10. The Quest for an Accelerated Population Count
      1. 10.1. Basic Methods
      2. 10.2. Divide and Conquer
      3. 10.3. Other Methods
      4. 10.4. Sum and Difference of Population Counts of Two Words
      5. 10.5. Comparing the Population Counts of Two Words
      6. 10.6. Counting the 1-Bits in an Array
      7. 10.7. Applications
    15. 11. Secure Communication: The Technology Of Freedom
      1. 11.1. The Heart of the Start
      2. 11.2. Untangling the Complexity of Secure Messaging
      3. 11.3. Usability Is the Key
      4. 11.4. The Foundation
      5. 11.5. The Test Suite
      6. 11.6. The Functioning Prototype
      7. 11.7. Clean Up, Plug In, Rock On…
      8. 11.8. Hacking in the Himalayas
      9. 11.9. The Invisible Hand Moves
      10. 11.10. Speed Does Matter
      11. 11.11. Communications Privacy for Individual Rights
      12. 11.12. Hacking the Civilization
    16. 12. Growing Beautiful Code in BioPerl
      1. 12.1. BioPerl and the Bio::Graphics Module
      2. 12.2. The Bio::Graphics Design Process
      3. 12.3. Extending Bio::Graphics
      4. 12.4. Conclusions and Lessons Learned
    17. 13. The Design of the Gene Sorte
      1. 13.1. The User Interface of the Gene Sorter
      2. 13.2. Maintaining a Dialog with the User over the Web
      3. 13.3. A Little Polymorphism Can Go a Long Way
      4. 13.4. Filtering Down to Just the Relevant Genes
      5. 13.5. Theory of Beautiful Code in the Large
      6. 13.6. Conclusion
    18. 14. How Elegant Code Evolves with Hardware The Case of Gaussian Elimination
      1. 14.1. The Effects of Computer Architectures on Matrix Algorithms
      2. 14.2. A Decompositional Approach
      3. 14.3. A Simple Version
      4. 14.4. LINPACK's DGEFA Subroutine
      5. 14.5. LAPACK DGETRF
      6. 14.6. Recursive LU
      7. 14.7. ScaLAPACK PDGETRF
      8. 14.8. Multithreading for Multi-Core Systems
      9. 14.9. A Word About the Error Analysis and Operation Count
      10. 14.10. Future Directions for Research
      11. 14.11. Further Reading
    19. 15. The Long-Term Benefits of Beautiful Design
      1. 15.1. My Idea of Beautiful Code
      2. 15.2. Introducing the CERN Library
      3. 15.3. Outer Beauty
      4. 15.4. Inner Beauty
      5. 15.5. Conclusion
    20. 16. The Linux Kernel Driver Model: The Benefits of Working Together
      1. 16.1. Humble Beginnings
      2. 16.2. Reduced to Even Smaller Bits
      3. 16.3. Scaling Up to Thousands of Devices
      4. 16.4. Small Objects Loosely Joined
    21. 17. Another Level of Indirection
      1. 17.1. From Code to Pointers
      2. 17.2. From Function Arguments to Argument Pointers
      3. 17.3. From Filesystems to Filesystem Layers
      4. 17.4. From Code to a Domain-Specific Language
      5. 17.5. Multiplexing and Demultiplexing
      6. 17.6. Layers Forever?
    22. 18. Python's Dictionary Implementation: Being All Things to All People
      1. 18.1. Inside the Dictionary
      2. 18.2. Special Accommodations
      3. 18.3. Collisions
      4. 18.4. Resizing
      5. 18.5. Iterations and Dynamic Changes
      6. 18.6. Conclusion
      7. 18.7. Acknowledgments
    23. 19. Multidimensional Iterators in NumPy
      1. 19.1. Key Challenges in N-Dimensional Array Operations
      2. 19.2. Memory Models for an N-Dimensional Array
      3. 19.3. NumPy Iterator Origins
      4. 19.4. Iterator Design
      5. 19.5. Iterator Interface
      6. 19.6. Iterator Use
      7. 19.7. Conclusion
    24. 20. A Highly Reliable Enterprise System for NASA's Mars Rover Mission
      1. 20.1. The Mission and the Collaborative Information Portal
      2. 20.2. Mission Needs
      3. 20.3. System Architecture
      4. 20.4. Case Study: The Streamer Service
      5. 20.5. Reliability
      6. 20.6. Robustness
      7. 20.7. Conclusion
    25. 21. ERP5: Designing for Maximum Adaptability
      1. 21.1. General Goals of ERP
      2. 21.2. ERP5
      3. 21.3. The Underlying Zope Platform
      4. 21.4. ERP5 Project Concepts
      5. 21.5. Coding the ERP5 Project
      6. 21.6. Conclusion
    26. 22. A Spoonful of Sewage
    27. 23. Distributed Programming with MapReduce
      1. 23.1. A Motivating Example
      2. 23.2. The MapReduce Programming Model
      3. 23.3. Other MapReduce Examples
      4. 23.4. A Distributed MapReduce Implementation
      5. 23.5. Extensions to the Model
      6. 23.6. Conclusion
      7. 23.7. Further Reading
      8. 23.8. Acknowledgments
      9. 23.9. Appendix: Word Count Solution
    28. 24. Beautiful Concurrency
      1. 24.1. A Simple Example: Bank Accounts
      2. 24.2. Software Transactional Memory
      3. 24.3. The Santa Claus Problem
      4. 24.4. Reflections on Haskell
      5. 24.5. Conclusion
      6. 24.6. Acknowledgments
    29. 25. Syntactic Abstraction: The syntax-case Expander
      1. 25.1. Brief Introduction to syntax-case
      2. 25.2. Expansion Algorithm
      3. 25.3. Example
      4. 25.4. Conclusion
    30. 26. Labor-Saving Architecture: An Object-Oriented Framework for Networked Software
      1. 26.1. Sample Application: Logging Service
      2. 26.2. Object-Oriented Design of the Logging Server Framework
      3. 26.3. Implementing Sequential Logging Servers
      4. 26.4. Implementing Concurrent Logging Servers
      5. 26.5. Conclusion
    31. 27. Integrating Business Partners the RESTful Way
      1. 27.1. Project Background
      2. 27.2. Exposing Services to External Clients
      3. 27.3. Routing the Service Using the Factory Pattern
      4. 27.4. Exchanging Data Using E-Business Protocols
      5. 27.5. Conclusion
    32. 28. Beautiful Debugging
      1. 28.1. Debugging a Debugger
      2. 28.2. A Systematic Process
      3. 28.3. A Search Problem
      4. 28.4. Finding the Failure Cause Automatically
      5. 28.5. Delta Debugging
      6. 28.6. Minimizing Input
      7. 28.7. Hunting the Defect
      8. 28.8. A Prototype Problem
      9. 28.9. Conclusion
      10. 28.10. Acknowledgments
      11. 28.11. Further Reading
    33. 29. Treating Code As an Essay
    34. 30. When a Button Is All That Connects You to the World
      1. 30.1. Basic Design Model
      2. 30.2. Input Interface
      3. 30.3. Efficiency of the User Interface
      4. 30.4. Download
      5. 30.5. Future Directions
    35. 31. Emacspeak: The Complete Audio Desktop
      1. 31.1. Producing Spoken Output
      2. 31.2. Speech-Enabling Emacs
      3. 31.3. Painless Access to Online Information
      4. 31.4. Summary
      5. 31.5. Acknowledgments
    36. 32. Code in Motion
      1. 32.1. On Being "Bookish"
      2. 32.2. Alike Looking Alike
      3. 32.3. The Perils of Indentation
      4. 32.4. Navigating Code
      5. 32.5. The Tools We Use
      6. 32.6. DiffMerge's Checkered Past
      7. 32.7. Conclusion
      8. 32.8. Acknowledgments
      9. 32.9. Further Reading
    37. 33. Writing Programs for "The Book"
      1. 33.1. The Nonroyal Road
      2. 33.2. Warning to Parenthophobes
      3. 33.3. Three in a Row
      4. 33.4. The Slippery Slope
      5. 33.5. The Triangle Inequality
      6. 33.6. Meandering On
      7. 33.7. "Duh!"—I Mean "Aha!"
      8. 33.8. Conclusion
      9. 33.9. Further Reading
    38. A. Afterword
    39. B. Contributors
    40. Colophon
    41. SPECIAL OFFER: Upgrade this ebook with O’Reilly
O'Reilly logo

Chapter 1. A Regular Expression Matcher

Brian Kernighan

Regular expressions are notations for describing patterns of text and, in effect, make up a special-purpose language for pattern matching. Although there are myriad variants, all share the idea that most characters in a pattern match literal occurrences of themselves, but some metacharacters have special meaning, such as * to indicate some kind of repetition or […] to mean any one character from the set within the brackets.

In practice, most searches in programs such as text editors are for literal words, so the regular expressions are often literal strings like print, which will match printf or sprint or printer paper anywhere. In so-called wildcards used to specify filenames in Unix and Windows, a * matches any number of characters, so the pattern *.c matches all filenames that end in .c. There are many, many variants of regular expressions, even in contexts where one would expect them to be the same. Jeffrey Friedl's Mastering Regular Expressions (O'Reilly) is an exhaustive study of the topic.

Stephen Kleene invented regular expressions in the mid-1950s as a notation for finite automata; in fact, they are equivalent to finite automata in what they represent. They first appeared in a program setting in Ken Thompson's version of the QED text editor in the mid-1960s. In 1967, Thompson applied for a patent on a mechanism for rapid text matching based on regular expressions. The patent was granted in 1971, one of the very first software patents [U.S. Patent 3,568,156, Text Matching Algorithm, March 2, 1971].

Regular expressions moved from QED to the Unix editor ed, and then to the quintessential Unix tool grep, which Thompson created by performing radical surgery on ed. These widely used programs helped regular expressions become familiar throughout the early Unix community.

Thompson's original matcher was very fast because it combined two independent ideas. One was to generate machine instructions on the fly during matching so that it ran at machine speed rather than by interpretation. The other was to carry forward all possible matches at each stage, so it did not have to backtrack to look for alternative potential matches. In later text editors that Thompson wrote, such as ed, the matching code used a simpler algorithm that backtracked when necessary. In theory, this is slower, but the patterns found in practice rarely involved backtracking, so the ed and grep algorithm and code were good enough for most purposes.

Subsequent regular expression matchers like egrep and fgrep added richer classes of regular expressions, and focused on fast execution no matter what the pattern. Ever-fancier regular expressions became popular and were included not only in C-based libraries, but also as part of the syntax of scripting languages such as Awk and Perl.

The Practice of Programming

In 1998, Rob Pike and I were writing The Practice of Programming (Addison-Wesley). The last chapter of the book, "Notation," collected a number of examples where good notation led to better programs and better programming. This included the use of simple data specifications (printf, for instance), and the generation of code from tables.

Because of our Unix backgrounds and nearly 30 years of experience with tools based on regular expression notation, we naturally wanted to include a discussion of regular expressions, and it seemed mandatory to include an implementation as well. Given our emphasis on tools, it also seemed best to focus on the class of regular expressions found in grep—rather than, say, those from shell wildcards—since we could also then talk about the design of grep itself.

The problem was that any existing regular expression package was far too big. The local grep was over 500 lines long (about 10 book pages) and encrusted with barnacles. Open source regular expression packages tended to be huge—roughly the size of the entire book—because they were engineered for generality, flexibility, and speed; none were remotely suitable for pedagogy.

I suggested to Rob that we find the smallest regular expression package that would illustrate the basic ideas while still recognizing a useful and nontrivial class of patterns. Ideally, the code would fit on a single page.

Rob disappeared into his office. As I remember it now, he emerged in no more than an hour or two with the 30 lines of C code that subsequently appeared in Chapter 9 of The Practice of Programming. That code implements a regular expression matcher that handles the following constructs.

Character

Meaning

c

Matches any literal character c.

. (period)

Matches any single character.

^

Matches the beginning of the input string.

$

Matches the end of the input string.

*

Matches zero or more occurrences of the previous character.

This is quite a useful class; in my own experience of using regular expressions on a day-to-day basis, it easily accounts for 95 percent of all instances. In many situations, solving the right problem is a big step toward creating a beautiful program. Rob deserves great credit for choosing a very small yet important, well-defined, and extensible set of features from among a wide set of options.

Rob's implementation itself is a superb example of beautiful code: compact, elegant, efficient, and useful. It's one of the best examples of recursion that I have ever seen, and it shows the power of C pointers. Although at the time we were most interested in conveying the important role of good notation in making a program easier to use (and perhaps easier to write as well), the regular expression code has also been an excellent way to illustrate algorithms, data structures, testing, performance enhancement, and other important topics.

The best content for your career. Discover unlimited learning on demand for around $1/day.