Chapter 3. Collecting data

This chapter covers

  • Collecting inherently uncertain data
  • Handling data collection at scale
  • Querying aggregates of uncertain data
  • Avoiding updating data after it’s been written to a database

This chapter begins our journey through the components, or phases, of a machine learning system (figure 3.1). Until there’s data in your machine learning system, you can’t do anything, so we’ll begin with collecting data. As you saw in chapter 1, the naive approach for getting data into a machine learning system can lead to all sorts of problems. This chapter will show you a much better way to collect data, one based on recording immutable facts. The approach in this chapter also assumes that the data being collected is intrinsically ...

Get Machine Learning Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.