Chapter 8. Regular Expressions

The limits of my language mean the limits of my world.

Ludwig Wittgenstein

The world’s first computers were women. They were the employees of research centers and national labs, where they inspected data, executed algorithms, and reorganized data. Their job title was “computer” because they computed. In the early days, computing meant evaluating raw data by hand for a variety of applications and experiments, including, famously, the Manhattan Project.

You, too, may have raw data. However, today’s data should not be processed by hand. Today’s data is usually too big, the risk of carpal tunnel is too high, and computers are too powerful to justify that. Processing raw textual physics data may require:

  • Searching for and correcting irregularities

  • Finding and replacing text across hundreds of files

  • Evaluating mathematical expressions

  • Manipulating number formatting

  • Rearranging column-formatted data

This chapter will discuss regular expressions, a common syntax for matching patterns of characters in text files, data files, filenames, and other sequences of characters. This syntax is ubiquitous in the programming world because it can turn an enormous, tedious file cleanup task into a tiny one-line command. Additionally, it can help with day-to-day command-line navigation, file parsing, and text editing.

In the shell, regular expressions can be used to clean up and analyze raw data in conjunction with the search and print programs that will ...

Get Effective Computation in Physics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.