Chapter 6Loading Data

Before you can begin to slice, dice, and roll up your data in BigQuery, first you have to get the data into the service. In Chapter 3, “Getting Started with BigQuery,” you worked through a simplified example of loading data to verify that billing was correctly enabled on your account. Unfortunately, loading data is not usually quite so simple. For that example a file hosted in Google Cloud Storage was available in a format understood by BigQuery, and you were supplied with a schema that matched the data. When you need to load your own data into the service, you need to tackle each of these steps. This is not to imply that loading data is super challenging; rather it is to emphasize that it is an important part of using the service that is at times overlooked.

There are two distinct pieces to the process of loading data into BigQuery:

  • Formatting your data appropriately
  • Transferring the data to BigQuery

In most scenarios the data you need to analyze lives in a system you control: files on your computer, records in a database, or logs from hosted servers, to name a few. The first task is to extract the data from the systems in a form that BigQuery can accept. In some cases this is trivial because the data happens to be in a suitable format such as a CSV file on your machine, but in other cases it might require some massaging or an extraction (the E in Extract-Transform-Load) from a database. With installed software you might be done at this point because ...

Get Google BigQuery Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.