Who was this Jason, and why did the gods favor him so? Where did he come from, and what was his story?

Homer, Greek poet

Chapter 3Working with Text Data

Often raw data comes from all kinds of text documents: structured documents (HTML, XML, CSV, and JSON files) or unstructured documents (plain, human-readable text). As a matter of fact, unstructured text is perhaps the hardest data source to work with because the processing software has to infer the meaning of the data items.

All data representations mentioned in the previous paragraph are human-readable. (That’s what makes them text documents.) If necessary, we can open any text file in a simple text editor (Notepad on Windows, gedit on Linux, TextEdit on Mac OS X) and read it with our bare ...

Get Data Science Essentials in Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.