selnekovic.com

From Google Sheets to BigQuery: A Simple Data Transfer Approach

Spreadsheets will never die. And as a data analyst or analytics engineer, you should know how to move data from Google Sheets to BigQuery. Maybe a business user refuses to give up their spreadsheets, or you’re working with a small dataset that someone updates manually.

Whatever the case, there are several ways to transfer this data into BigQuery. For example, a third-party connector is an easy solution, but it comes at a cost. Apps Script is another option, though if JavaScript and the quirks of scripting in Sheets aren’t your thing, it might not be ideal. Cloud Run or Cloud Functions provide automation but require some setup and maintenance.

The approach we will focus on today is using BigQuery external tables, which is simple, almost free (you only pay for queries), and doesn’t require moving the data manually. We will start by querying the data directly from Google Sheets, then load it into an internal table and schedule daily updates to keep everything fresh.

Google Sheets
Our mock-up data in Google Sheets.

External vs. Internal Tables in BigQuery

With an external table, you can access and query the data, but you’re not storing it in BigQuery. The data remains in Google Sheets, and BigQuery fetches it every time you run a query. This ensures you’re always working with the latest version without needing to move anything. However, since the data isn’t stored inside BigQuery, queries can be slower. Another drawback is that if updates to the Sheet overwrite existing data instead of appending new rows, you won’t have a historical record. On top of that, Google Sheets has API limits, so querying it too frequently could get you temporarily locked out.

With an internal table, instead of fetching data from Google Sheets every time, you load it into BigQuery and store it permanently. This makes queries much faster and more stable because you’re no longer relying on an external source. Internal tables also support additional functionality that external tables don’t, such as the ability to use partitions and clustering for better performance, apply row-level security, and take advantage of materialized views. However, since the data is now inside BigQuery, it won’t update automatically, so you need to set up a process to refresh it. While internal tables do come with storage costs, BigQuery storage is relatively cheap, so this is not a big deal in most cases.

Google Sheets → BigQuery

1. Share the Google Sheet

To share your Google Sheet with BigQuery, you must adjust its access settings by granting the Viewer role. If you don’t mind broader access, you can grant general access by sharing the sheet via a link.

If you want to keep access restricted, grant access using the email address of the GCP project owner. In my experience, this email should be different from the sheet owner’s.

IAM & Admin permissions
Share the Google Sheet

2. Create a New Dataset and External Table

Set up a new dataset in BigQuery where your data will be stored. This involves selecting your project and creating a named dataset. Then create a (external) table that links to your Google Sheet. This allows you to query the sheet as a regular BigQuery table.

  • Create a table from – select Drive
  • Select Drive URI – copy & paste the address of your Google Sheets workbook
  • File format – select Google Sheet
  • Sheet range – specify the sheet’s name and cell range you want to query. If left blank, BigQuery will use the first sheet in the workbook.
  • Project – your current project
  • Dataset – the dataset you created for Google Sheets data
  • Table – some meaningful name for your table
Create external table - part 1
  • Schema – select Auto detect the schema. If you will not be happy with the result, you can delete the table and manually define the schema on the second attempt.
  • Header rows to skip – If your table contains a header row, skip this row so BigQuery can use it as the column names.
Create external table - part 2

That’s it! Your Google Sheets workbook is now connected to BigQuery, and you can start querying it if you’d like. But we’re not finished yet.

External table in BigQuery
3. Move Data to an Internal Table

For better performance, transfer the data to an internal BigQuery table by running the following SQL query:

SQL
CREATE OR REPLACE TABLE
  `your_project.dataset.final_table` AS
SELECT
  *
FROM
  `your_project.dataset.final_table`

In my case, the query looks like this:

Create internal table - query
4. Schedule Daily Updates

To ensure fresh data is available in your internal table set up a scheduled query:

  • Write a query to INSERT new data into the internal table. The exact query will depend on your data. You must include a WHERE clause to select new data based on a specific condition. If you want the update to run daily, use a query like this:
SQL
INSERT INTO 
  `your_project.dataset.final_table`
SELECT
  *
FROM
  `your_project.dataset.final_table`
WHERE 
  date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY);
  • Click Schedule, then fill in the required fields to set the update frequency. Choose a time when your sheet’s data will already be updated. Do not select “Set a destination table for query results”. This option is not allowed for DML queries.
Schedule the insert
Schedule the query - time
Schedule the query - destination

And you are done. Congratulations 🙂

Conclusion

In the end, moving data from Google Sheets to BigQuery doesn’t have to be a headache. Whether you use a third-party connector, scripting, or automation, the right approach depends on your needs. The simplest method may not be the best for you.

Share the knowledge