dbt
Adding a Google Sheets source to dbt
-
Create your sheet Data Integration / Google Sheets / In.
-
Add a named range. This should encompass the entirety of the tab.
-
Update
src/dbt/kipptaf/models/google/sheets/sources-drive.yml
. Add a new entry under thetables
attribute. Use the example below, replacing the variables with your specific sheet's information.
Variable | Description |
---|---|
SOURCE_NAME | The name of the collection for your source tables |
SOURCE_TABLE_NAME | The name of the source table, as it will appear in the database |
SHEET_URL | The full URL of the Google Sheet, including https://... |
NAMED_RANGE | The name of the range you defined in step 2. |
sources:
- name: ...
tables:
...
- name: src_google_sheets__{SOURCE NAME}__{SOURCE TABLE NAME}
external:
options:
format: GOOGLE_SHEETS
uris:
- {SHEET URL}
sheet_range: {NAMED RANGE}
skip_leading_rows: 1
meta:
dagster:
asset_key:
- kipptaf
- {SOURCE NAME}
- {SOURCE TABLE NAME}
- Create a staging model. Create a simple
select *
statement--this will help catch unexpected changes to the table schema--and add any calculated fields you require.
select *, spam + 1 as eggs, from {{ source("{SOURCE NAME}", "{SOURCE TABLE NAME}") }}
- Create a corresponding properties file for the staging model
File name: ../properties/STAGING_MODEL_NAME.yml
Variable | Description |
---|---|
models[].name |
The exact name of your staging model |
columns[].name |
Column names are case-sensitive |
data_type |
BigQuery Data Type Reference |
models:
- name: {STAGING MODEL NAME}
columns:
- name: {COLUMN NAME 1}
data_type: {COLUMN TYPE 1}
- name: {COLUMN NAME 2}
data_type: {COLUMN TYPE 2}
...
- name: {COLUMN NAME N}
data_type: {COLUMN TYPE N}
- Update the external source definition
dbt run-operation stage_external_sources --vars "{'ext_full_refresh': 'true'}" --args select: [model name(s)]
- Build your staging model
Updating a Google Sheets source
-
Duplicate the tab you are updating
-
Create a new named range for the new tab. Use the same name as the old tab, suffixed with something to make it unique (e.g.
new
,_v2
) -
Update
src/dbt/kipptaf/models/google/sheets/sources-drive.yml
. Update thesheet_range
attribute with the new named range. -
Make your changes to the sheet
-
If necessary, update the source column definitions
-
Update the external source definition
dbt run-operation stage_external_sources --vars "{'ext_full_refresh': 'true'}" --args "select: [model name(s)]"
- Update the data contract for your staging file