You are previewing Beautiful Code.

Beautiful Code

Cover of Beautiful Code by Greg Wilson... Published by O'Reilly Media, Inc.
  1. Beautiful Code
    1. SPECIAL OFFER: Upgrade this ebook with O’Reilly
    2. A Note Regarding Supplemental Files
    3. Foreword
    4. Preface
      1. How This Book Is Organized
      2. Conventions Used in This Book
      3. Using Code Examples
      4. How to Contact Us
      5. Safari® Enabled
    5. 1. A Regular Expression Matcher
      1. 1.1. The Practice of Programming
      2. 1.2. Implementation
      3. 1.3. Discussion
      4. 1.4. Alternatives
      5. 1.5. Building on It
      6. 1.6. Conclusion
    6. 2. Subversion's Delta Editor: Interface As Ontology
      1. 2.1. Version Control and Tree Transformation
      2. 2.2. Expressing Tree Differences
      3. 2.3. The Delta Editor Interface
      4. 2.4. But Is It Art?
      5. 2.5. Abstraction As a Spectator Sport
      6. 2.6. Conclusions
    7. 3. The Most Beautiful Code I Never Wrote
      1. 3.1. The Most Beautiful Code I Ever Wrote
      2. 3.2. More and More with Less and Less
      3. 3.3. Perspective
      4. 3.4. What Is Writing?
      5. 3.5. Conclusion
      6. 3.6. Acknowledgments
    8. 4. Finding Things
      1. 4.1. On Time
      2. 4.2. Problem: Weblog Data
      3. 4.3. Problem: Who Fetched What, When?
      4. 4.4. Search in the Large
      5. 4.5. Conclusion
    9. 5. Correct, Beautiful, Fast (in That Order): Lessons from Designing XML Verifiers
      1. 5.1. The Role of XML Validation
      2. 5.2. The Problem
      3. 5.3. Version 1: The Naïve Implementation
      4. 5.4. Version 2: Imitating the BNF Grammar O(N)
      5. 5.5. Version 3: First Optimization O(log N)
      6. 5.6. Version 4: Second Optimization: Don't Check Twice
      7. 5.7. Version 5: Third Optimization O(1)
      8. 5.8. Version 6: Fourth Optimization: Caching
      9. 5.9. The Moral of the Story
    10. 6. Framework for Integrated Test: Beauty Through Fragility
      1. 6.1. An Acceptance Testing Framework in Three Classes
      2. 6.2. The Challenge of Framework Design
      3. 6.3. An Open Framework
      4. 6.4. How Simple Can an HTML Parser Be?
      5. 6.5. Conclusion
    11. 7. Beautiful Tests
      1. 7.1. That Pesky Binary Search
      2. 7.2. Introducing JUnit
      3. 7.3. Nailing Binary Search
      4. 7.4. Conclusion
    12. 8. On-the-Fly Code Generation for Image Processing
    13. 9. Top Down Operator Precedence
      1. 9.1. JavaScript
      2. 9.2. Symbol Table
      3. 9.3. Tokens
      4. 9.4. Precedence
      5. 9.5. Expressions
      6. 9.6. Infix Operators
      7. 9.7. Prefix Operators
      8. 9.8. Assignment Operators
      9. 9.9. Constants
      10. 9.10. Scope
      11. 9.11. Statements
      12. 9.12. Functions
      13. 9.13. Array and Object Literals
      14. 9.14. Things to Do and Think About
    14. 10. The Quest for an Accelerated Population Count
      1. 10.1. Basic Methods
      2. 10.2. Divide and Conquer
      3. 10.3. Other Methods
      4. 10.4. Sum and Difference of Population Counts of Two Words
      5. 10.5. Comparing the Population Counts of Two Words
      6. 10.6. Counting the 1-Bits in an Array
      7. 10.7. Applications
    15. 11. Secure Communication: The Technology Of Freedom
      1. 11.1. The Heart of the Start
      2. 11.2. Untangling the Complexity of Secure Messaging
      3. 11.3. Usability Is the Key
      4. 11.4. The Foundation
      5. 11.5. The Test Suite
      6. 11.6. The Functioning Prototype
      7. 11.7. Clean Up, Plug In, Rock On…
      8. 11.8. Hacking in the Himalayas
      9. 11.9. The Invisible Hand Moves
      10. 11.10. Speed Does Matter
      11. 11.11. Communications Privacy for Individual Rights
      12. 11.12. Hacking the Civilization
    16. 12. Growing Beautiful Code in BioPerl
      1. 12.1. BioPerl and the Bio::Graphics Module
      2. 12.2. The Bio::Graphics Design Process
      3. 12.3. Extending Bio::Graphics
      4. 12.4. Conclusions and Lessons Learned
    17. 13. The Design of the Gene Sorte
      1. 13.1. The User Interface of the Gene Sorter
      2. 13.2. Maintaining a Dialog with the User over the Web
      3. 13.3. A Little Polymorphism Can Go a Long Way
      4. 13.4. Filtering Down to Just the Relevant Genes
      5. 13.5. Theory of Beautiful Code in the Large
      6. 13.6. Conclusion
    18. 14. How Elegant Code Evolves with Hardware The Case of Gaussian Elimination
      1. 14.1. The Effects of Computer Architectures on Matrix Algorithms
      2. 14.2. A Decompositional Approach
      3. 14.3. A Simple Version
      4. 14.4. LINPACK's DGEFA Subroutine
      5. 14.5. LAPACK DGETRF
      6. 14.6. Recursive LU
      7. 14.7. ScaLAPACK PDGETRF
      8. 14.8. Multithreading for Multi-Core Systems
      9. 14.9. A Word About the Error Analysis and Operation Count
      10. 14.10. Future Directions for Research
      11. 14.11. Further Reading
    19. 15. The Long-Term Benefits of Beautiful Design
      1. 15.1. My Idea of Beautiful Code
      2. 15.2. Introducing the CERN Library
      3. 15.3. Outer Beauty
      4. 15.4. Inner Beauty
      5. 15.5. Conclusion
    20. 16. The Linux Kernel Driver Model: The Benefits of Working Together
      1. 16.1. Humble Beginnings
      2. 16.2. Reduced to Even Smaller Bits
      3. 16.3. Scaling Up to Thousands of Devices
      4. 16.4. Small Objects Loosely Joined
    21. 17. Another Level of Indirection
      1. 17.1. From Code to Pointers
      2. 17.2. From Function Arguments to Argument Pointers
      3. 17.3. From Filesystems to Filesystem Layers
      4. 17.4. From Code to a Domain-Specific Language
      5. 17.5. Multiplexing and Demultiplexing
      6. 17.6. Layers Forever?
    22. 18. Python's Dictionary Implementation: Being All Things to All People
      1. 18.1. Inside the Dictionary
      2. 18.2. Special Accommodations
      3. 18.3. Collisions
      4. 18.4. Resizing
      5. 18.5. Iterations and Dynamic Changes
      6. 18.6. Conclusion
      7. 18.7. Acknowledgments
    23. 19. Multidimensional Iterators in NumPy
      1. 19.1. Key Challenges in N-Dimensional Array Operations
      2. 19.2. Memory Models for an N-Dimensional Array
      3. 19.3. NumPy Iterator Origins
      4. 19.4. Iterator Design
      5. 19.5. Iterator Interface
      6. 19.6. Iterator Use
      7. 19.7. Conclusion
    24. 20. A Highly Reliable Enterprise System for NASA's Mars Rover Mission
      1. 20.1. The Mission and the Collaborative Information Portal
      2. 20.2. Mission Needs
      3. 20.3. System Architecture
      4. 20.4. Case Study: The Streamer Service
      5. 20.5. Reliability
      6. 20.6. Robustness
      7. 20.7. Conclusion
    25. 21. ERP5: Designing for Maximum Adaptability
      1. 21.1. General Goals of ERP
      2. 21.2. ERP5
      3. 21.3. The Underlying Zope Platform
      4. 21.4. ERP5 Project Concepts
      5. 21.5. Coding the ERP5 Project
      6. 21.6. Conclusion
    26. 22. A Spoonful of Sewage
    27. 23. Distributed Programming with MapReduce
      1. 23.1. A Motivating Example
      2. 23.2. The MapReduce Programming Model
      3. 23.3. Other MapReduce Examples
      4. 23.4. A Distributed MapReduce Implementation
      5. 23.5. Extensions to the Model
      6. 23.6. Conclusion
      7. 23.7. Further Reading
      8. 23.8. Acknowledgments
      9. 23.9. Appendix: Word Count Solution
    28. 24. Beautiful Concurrency
      1. 24.1. A Simple Example: Bank Accounts
      2. 24.2. Software Transactional Memory
      3. 24.3. The Santa Claus Problem
      4. 24.4. Reflections on Haskell
      5. 24.5. Conclusion
      6. 24.6. Acknowledgments
    29. 25. Syntactic Abstraction: The syntax-case Expander
      1. 25.1. Brief Introduction to syntax-case
      2. 25.2. Expansion Algorithm
      3. 25.3. Example
      4. 25.4. Conclusion
    30. 26. Labor-Saving Architecture: An Object-Oriented Framework for Networked Software
      1. 26.1. Sample Application: Logging Service
      2. 26.2. Object-Oriented Design of the Logging Server Framework
      3. 26.3. Implementing Sequential Logging Servers
      4. 26.4. Implementing Concurrent Logging Servers
      5. 26.5. Conclusion
    31. 27. Integrating Business Partners the RESTful Way
      1. 27.1. Project Background
      2. 27.2. Exposing Services to External Clients
      3. 27.3. Routing the Service Using the Factory Pattern
      4. 27.4. Exchanging Data Using E-Business Protocols
      5. 27.5. Conclusion
    32. 28. Beautiful Debugging
      1. 28.1. Debugging a Debugger
      2. 28.2. A Systematic Process
      3. 28.3. A Search Problem
      4. 28.4. Finding the Failure Cause Automatically
      5. 28.5. Delta Debugging
      6. 28.6. Minimizing Input
      7. 28.7. Hunting the Defect
      8. 28.8. A Prototype Problem
      9. 28.9. Conclusion
      10. 28.10. Acknowledgments
      11. 28.11. Further Reading
    33. 29. Treating Code As an Essay
    34. 30. When a Button Is All That Connects You to the World
      1. 30.1. Basic Design Model
      2. 30.2. Input Interface
      3. 30.3. Efficiency of the User Interface
      4. 30.4. Download
      5. 30.5. Future Directions
    35. 31. Emacspeak: The Complete Audio Desktop
      1. 31.1. Producing Spoken Output
      2. 31.2. Speech-Enabling Emacs
      3. 31.3. Painless Access to Online Information
      4. 31.4. Summary
      5. 31.5. Acknowledgments
    36. 32. Code in Motion
      1. 32.1. On Being "Bookish"
      2. 32.2. Alike Looking Alike
      3. 32.3. The Perils of Indentation
      4. 32.4. Navigating Code
      5. 32.5. The Tools We Use
      6. 32.6. DiffMerge's Checkered Past
      7. 32.7. Conclusion
      8. 32.8. Acknowledgments
      9. 32.9. Further Reading
    37. 33. Writing Programs for "The Book"
      1. 33.1. The Nonroyal Road
      2. 33.2. Warning to Parenthophobes
      3. 33.3. Three in a Row
      4. 33.4. The Slippery Slope
      5. 33.5. The Triangle Inequality
      6. 33.6. Meandering On
      7. 33.7. "Duh!"—I Mean "Aha!"
      8. 33.8. Conclusion
      9. 33.9. Further Reading
    38. A. Afterword
    39. B. Contributors
    40. Colophon
    41. SPECIAL OFFER: Upgrade this ebook with O’Reilly
O'Reilly logo

Building on It

The purpose of The Practice of Programming was to teach good programming. At the time the book was written, Rob and I were still at Bell Labs, so we did not have firsthand experience of how the book would be best used in a classroom. It has been gratifying to discover that some of the material does work well in classes. I have used this code since 2000 as a vehicle for teaching important points about programming.

First, it shows how recursion is useful and leads to clean code in a novel setting; it's not yet another version of Quicksort (or factorial!), nor is it some kind of tree walk.

It's also a good example for performance experiments. Its performance is not very different from the system versions of grep, which shows that the recursive technique is not too costly and that it's not worth trying to tune the code.

On the other hand, it is also a fine illustration of the importance of a good algorithm. If a pattern includes several .* sequences, the straightforward implementation requires a lot of backtracking, and, in some cases, will run very slowly indeed.

The standard Unix grep has the same backtracking properties. For example, the command:

	grep 'a.*a.*a.*a.a'

takes about 20 seconds to process a 4 MB text file on a typical machine.

An implementation based on converting a nondeterministic finite automaton to a deterministic automaton, as in egrep, will have much better performance on hard cases; it can process the same pattern and the same input in less than one-tenth of a second, and running time in general is independent of the pattern.

Extensions to the regular expression class can form the basis of a variety of assignments. For example:

  1. Add other metacharacters, such as + for one or more occurrences of the previous character, or ? for zero or one matches. Add some way to quote metacharacters, such as \$ to stand for a literal occurrence of $.

  2. Separate regular expression processing into a compilation phase and an execution phase. Compilation converts the regular expression into an internal form that makes the matching code simpler or allows the subsequent matching to run faster. This separation is not necessary for the simple class of regular expressions in the original design, but it makes sense in grep-like applications where the class is richer and the same regular expression is used for a large number of input lines.

  3. Add character classes such as [abc] and [0-9], which in conventional grep notation match a or b or c and a digit, respectively. This can be done in several ways, the most natural of which seems to be replacing the char* variables of the original code with a structure:

    	typedef struct RE {
    	        int     type;   /* CHAR, STAR, etc. */ 
    	        int     ch;     /* the character itself */ 
    	        char    *ccl;   /* for [...] instead */
    	        int     nccl;   /* true if class is negated [^...] */
    	} RE;

    and modifying the basic code to handle an array of these instead of an array of characters. It's not strictly necessary to separate compilation from execution for this situation, but it turns out to be a lot easier. Students who follow the advice to pre-compile into such a structure invariably do better than those who try to interpret some complicated pattern data structure on the fly.

    Writing clear and unambiguous specifications for character classes is tough, and implementing them perfectly is worse, requiring a lot of tedious and uninstructive coding. I have simplified this assignment over time, and today most often ask for Perl-like shorthands such as \d for digit and \D for nondigit instead of the original bracketed ranges.

  4. Use an opaque type to hide the RE structure and all the implementation details. This is a good way to show object-oriented programming in C, which doesn't support much beyond this. In effect, this creates a regular expression class that uses function names like RE_new() and RE_match() for the methods instead of the syntactic sugar of an object-oriented language.

  5. Modify the class of regular expressions to be like the wildcards in various shells: matches are implicitly anchored at both ends, * matches any number of characters, and ? matches any single character. One can modify the algorithm or map the input into the existing algorithm.

  6. Convert the code to Java. The original code uses C pointers very well, and it's good practice to figure out the alternatives in a different language. Java versions use either String.charAt (indexing instead of pointers) or String.substring (closer to the pointer version). Neither seems as clear as the C code, and neither is as compact. Although performance isn't really part of this exercise, it is interesting to see that the Java implementation runs roughly six or seven times slower than the C versions.

  7. Write a wrapper class that converts from this class's regular expressions to Java's Pattern and Matcher classes, which separate the compilation and matching in a quite different way. This is a good example of the Adapter or Facade pattern, which puts a different face on an existing class or set of functions.

I've also used this code extensively to explore testing techniques. Regular expressions are rich enough that testing is far from trivial, but small enough that one can quickly write down a substantial collection of tests to be performed mechanically. For extensions like those just listed, I ask students to write a large number of tests in a compact language (yet another example of "notation") and use those tests on their own code; naturally, I use their tests on other students' code as well.

The best content for your career. Discover unlimited learning on demand for around $1/day.