Chapter 11Managing Data Stored in BigQuery

The previous chapters cover how BigQuery simplifies analytics over large datasets. BigQuery also has features to simplify data management and the integration of analytics into an application. This chapter covers those features and how to handle common data warehousing tasks using them.

Query Caching

As discussed in Chapter 7, “Running Queries,” BigQuery has an auto-caching feature that enables it to reuse results across identical queries. This feature is convenient because it is transparent to the user but is limited to instances in which the service can guarantee that existing results from a prior query job are identical to the results that would be generated by running the query again, which we will elaborate on below. The application developer, on the other hand, knows a great deal more about the use case. So when the application can trade freshness for execution cost, it is possible to further reduce query costs by directly managing caching. With many data warehousing systems, it is necessary to utilize a separate caching framework, for example Memcached, to reduce load on the query engine or the latency of operations in a front end. With BigQuery it is usually feasible to avoid a separate caching framework for query results by leveraging the feature that query results are actually new tables that can be assigned an explicit name. Different parts of the application can interact with the same query result by accessing the appropriate ...

Get Google BigQuery Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.