Skip to content

dbt

Adding a Google Sheets source to dbt

  1. Create your sheet Data Integration / Google Sheets / In.

  2. Add a named range. This should encompass the entirety of the tab.

  3. Update src/dbt/kipptaf/models/google/sheets/sources-drive.yml. Add a new entry under the tables attribute. Use the example below, replacing the variables with your specific sheet's information.

Variable Description
SOURCE_NAME The name of the collection for your source tables
SOURCE_TABLE_NAME The name of the source table, as it will appear in the database
SHEET_URL The full URL of the Google Sheet, including https://...
NAMED_RANGE The name of the range you defined in step 2.
sources:
  - name: ...
    tables:
      ...
      - name: src_google_sheets__{SOURCE NAME}__{SOURCE TABLE NAME}
        external:
          options:
            format: GOOGLE_SHEETS
            uris:
              - {SHEET URL}
            sheet_range: {NAMED RANGE}
            skip_leading_rows: 1
        meta:
          dagster:
            asset_key:
              - kipptaf
              - {SOURCE NAME}
              - {SOURCE TABLE NAME}
  1. Create a staging model. Create a simple select * statement--this will help catch unexpected changes to the table schema--and add any calculated fields you require.
select *, spam + 1 as eggs, from {{ source("{SOURCE NAME}", "{SOURCE TABLE NAME}") }}
  1. Create a corresponding properties file for the staging model

File name: ../properties/STAGING_MODEL_NAME.yml

Variable Description
models[].name The exact name of your staging model
columns[].name Column names are case-sensitive
data_type BigQuery Data Type Reference
models:
  - name: {STAGING MODEL NAME}
    columns:
      - name: {COLUMN NAME 1}
        data_type: {COLUMN TYPE 1}
      - name: {COLUMN NAME 2}
        data_type: {COLUMN TYPE 2}
      ...
      - name: {COLUMN NAME N}
        data_type: {COLUMN TYPE N}
  1. Update the external source definition
dbt run-operation stage_external_sources --vars "{'ext_full_refresh': 'true'}" --args select: [model name(s)]
  1. Build your staging model

Updating a Google Sheets source

  1. Duplicate the tab you are updating

  2. Create a new named range for the new tab. Use the same name as the old tab, suffixed with something to make it unique (e.g. new, _v2)

  3. Update src/dbt/kipptaf/models/google/sheets/sources-drive.yml. Update the sheet_range attribute with the new named range.

  4. Make your changes to the sheet

  5. If necessary, update the source column definitions

  6. Update the external source definition

dbt run-operation stage_external_sources --vars "{'ext_full_refresh': 'true'}" --args "select: [model name(s)]"
  1. Update the data contract for your staging file