Chapter 9Understanding Query Execution

The SQL query language defines what data should be returned by a query, not how the results should be obtained. For the past 40 years or so, the primary engine for performing SQL queries has been the relational database. People are familiar with how a relational database works. They've developed an intuition for what will run quickly, what will be inefficient, and what kinds of things to avoid. Their intuition is based on knowledge about how a relational database will execute their queries.

Although BigQuery runs the same types of SQL queries that you can run on a relational database, it executes them in a different way. Because of this, intuition that you may have about query execution is likely to lead you astray. For example, in a relational database, there may be a performance advantage to storing some computed value so that it can be indexed. In BigQuery, because of the parallel architecture, you can do complex manipulation inline in the query without a significant change in query execution time.

This chapter describes the architecture of the underlying Dremel query engine used by BigQuery. The aim is to help you develop an intuition about how BigQuery queries will execute. It also should shine a light on some of the quirks of execution, such as why you may get a Response Too Large error even if you've specified that you want only 10 rows in the response.

There are three main sections in this chapter. The first part describes the ColumnIO ...

Get Google BigQuery Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.