4.4. Capturing Captions

You're working on a book with large numbers of tables, maps, and figures, illustrations, and the publisher has decided that all illustrations that are not referenced in the text should be deleted. Finding all these references in a text is easy with a regular expression. Illustration references have the form Figure 3.16, that is, a caption title such as Figure or Map followed by two numbers—chapter and illustration number—separated by a dot. This regular expression finds all references:

/(figure|table|map) \d+\.\d+/gi

This matches figure, table, and map followed by a space, a number, a dot (must be escaped, as otherwise you use the dot wildcard that matches anything), followed by another number. Apart from g (for global), the regex is modified i for case-insensitive matching. If you suspect that there might be references to illustrations by illustration number only (e.g., Figure 16), then you should make the dot and the following number optional:

/(figure|map|table) \d+(\.\d+)?/gi

The dot and following number are grouped and made optional by the question mark. Naturally, this caption checker makes sense only when the figures and their captions have not been placed in the text yet.

Get Automating InDesign with Regular Expressions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.