dbt
Adding a Google Sheets source to dbt
-
Create your sheet Data Integration / Google Sheets / In.
-
Add a named range. This should encompass the entirety of the tab.
-
Update
src/dbt/kipptaf/models/google/sheets/sources-drive.yml. Add a new entry under thetablesattribute. Use the example below, replacing the variables with your specific sheet's information.
| Variable | Description |
|---|---|
| SOURCE_NAME | The name of the collection for your source tables |
| SOURCE_TABLE_NAME | The name of the source table, as it will appear in the database |
| SHEET_URL | The full URL of the Google Sheet, including https://... |
| NAMED_RANGE | The name of the range you defined in step 2. |
sources:
- name: ...
tables:
...
- name: src_google_sheets__{SOURCE NAME}__{SOURCE TABLE NAME}
external:
options:
format: GOOGLE_SHEETS
uris:
- {SHEET URL}
sheet_range: {NAMED RANGE}
skip_leading_rows: 1
meta:
dagster:
asset_key:
- kipptaf
- {SOURCE NAME}
- {SOURCE TABLE NAME}
- Update the external source definition
dbt run-operation stage_external_sources --vars "{'ext_full_refresh': 'true'}" --args "select: [model name(s)]"
- Create a staging model. Create a simple
select *statement--this will help catch unexpected changes to the table schema--and add any calculated fields you require.
select *, spam + 1 as eggs, from {{ source("{SOURCE NAME}", "{SOURCE TABLE NAME}") }}
- Create a corresponding properties file for the staging model
File name: ../properties/STAGING_MODEL_NAME.yml
| Variable | Description |
|---|---|
models[].name |
The exact name of your staging model |
columns[].name |
Column names are case-sensitive |
data_type |
BigQuery Data Type Reference |
models:
- name: {STAGING MODEL NAME}
columns:
- name: {COLUMN NAME 1}
data_type: {COLUMN TYPE 1}
- name: {COLUMN NAME 2}
data_type: {COLUMN TYPE 2}
...
- name: {COLUMN NAME N}
data_type: {COLUMN TYPE N}
- Build your staging model
Updating a Google Sheets source
-
Duplicate the tab that you are modifying. If you will only be adding columns to the end of the sheet, you can skip this step.
-
Create a new named range. Use the same name but suffixed with something to make it unique (e.g.
_new,_v2) -
Update
src/dbt/kipptaf/models/google/sheets/sources-drive.yml. Update thesheet_rangeattribute with the new named range. -
Make your changes to the sheet.
-
If necessary, update column definitions on the source YAML.
-
Update the external source definition
dbt run-operation stage_external_sources --vars "{'ext_full_refresh': 'true'}" --args "select: [model name(s)]"
- Update the data contract for your staging file