Chapter 4
Stage 1: Data Extraction
Summary
This chapter describes the Guerrilla Analytics workflow stage of Data Extraction. It will discuss the pitfalls and risks associated with extracting data from systems. We then make a set of recommendations that apply Guerrilla Analytics principles to reduce these risks, avoid these pitfalls, and maintain data provenance.
Keywords
Data Extraction
File Formats
Checksums
4.1. Guerrilla Analytics workflow
Data Extraction is the first stage in the Guerrilla Analytics workflow (Section 2.1), as illustrated in Figure 9. It involves taking data out of some system or location so it can be brought into the analytics team’s Data Manipulation Environment (DME). The place the data is extracted from is called ...
Get Guerrilla Analytics now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.