Understanding tables and complex layouts

In order to work successfully with PDF documents, we need to process some parts of the page geometry. For some kinds of running text, we don't need to worry about where the text appears on the page. But for tabular layouts, we're forced to understand the gridded nature of the display. We're also forced to grapple with the amazing subtlety of how the human eye can take a jumble of letters on a page and resolves them into meaningful rows and columns.

It doesn't matter now, but as we move forward it will become necessary to understand two pieces of PDF trivia. First, coordinates are in points, which are about 1/72 of an inch. Second, the origin, (0,0), is the lower-left corner of the page. As we read down the ...

Get Python for Secret Agents - Volume II now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.