CHAPTER 10Data Development: Making It Organized

Data is the lifeblood of any analytical exercise and usually one of the bigger challenges. Sourcing, organizing, and stitching together data is typically where a large amount of time is spent in building an analytical solution. During these steps of pulling together the datasets, the quality of the data will be key. If the data is missing, incorrect, or inconsistent, the results of the analysis will be partial, or worse, incorrect. Once the data is compiled, determining the right analytical structure is important for performance, integrity, and scalability. Application of business rules and transformation of fields are also concepts that need to be addressed in order to make various datasets suitable for analysis. In this chapter we will cover several of the basic data concepts important to analytics—quality, type, organization, and transformation.

Data Quality

The quality of the data may be the most important factor in determining the ability to produce usable insights from a dataset. The source of the data is often a vital indicator of the quality of the data. Is the data from an enterprise resource planning (ERP) system or a legacy system prone to human error? Does the data entry system allow free-form text or is it mostly lists the user selects from? Is the data structured or unstructured? All of these questions play a key role in determining the quality of the dataset and its reliability.

The core tenets of data quality include: ...

Get Monetizing Your Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.