Chapter 21. Processing data files

This chapter covers

  • Using ETL (extract-transform-load)
  • Reading text data files (plain text and CSV)
  • Reading spreadsheet files
  • Normalizing, cleaning, and sorting data
  • Writing data files

Much of the data available is contained in text files. This data can range from unstructured text, such as a corpus of tweets or literary texts, to more structured data in which each row is a record and the fields are delimited by a special character, such as a comma, a tab, or a pipe (|). Text files can be huge; a data set can be spread over tens or even hundreds of files, and the data in it can be incomplete or horribly dirty. With all the variations, it’s almost inevitable that you’ll need to read and use data from text ...

Get The Quick Python Book, Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.