teamster π
Next-gen data orchestration for KIPP TEAM & Family Schools
Teamster is the data engineering platform powering analytics and reporting across KIPP Newark, Camden, Miami, and Paterson. It ingests data from 30+ source systems, transforms it through dbt, and delivers it to Tableau, Google Sheets, PowerSchool, and other consumers β all orchestrated by Dagster.
- π» Dagster β orchestrates every ETL step across five code locations, one per school network; observe and run pipelines in Dagster Cloud
- π§ dbt β transforms raw source data into staging, intermediate, mart, and extract models in Google BigQuery
- πΏ dlt β loads data from API sources into BigQuery alongside dbt
- π Airbyte β managed connector pipelines for select integrations
- πͺ£ Google Cloud Storage β intermediate storage layer between pipeline steps
- βΈοΈ Google Kubernetes Engine β runs each code location in its own container in production
- βοΈ GitHub Actions β CI/CD for building and deploying code locations
- π Tableau β primary BI consumer; Dagster manages workbook extract refreshes
π Background
KIPP's data infrastructure was previously a patchwork of Python scripts, cron jobs, stored procedures, Fivetran, and Selenium automation spread across multiple databases. Synchronous scheduling meant a slow pull from one system would cascade into downstream failures. A single data engineer spent more time firefighting than building.
Teamster replaced all of it with a unified, asset-based platform. The results:
- β‘ Pipeline development time dropped from weeks to days
- π« Data-related support tickets fell 30% year-over-year
- π§βπ» Analysts gained Git, SQL, and DevOps skills through shared PR workflows
- π Real-time Slack alerts replaced reactive debugging
"The visibility into the pipelines is a game changer. We know as soon as something fails and why."
Read the full story in the Dagster case study.
π Get started
New to the project? Start here:
- Getting Started β account setup, Codespaces, local dev
- Architecture β how the code is organized
- Contributing β workflow and PR guidelines
π Reference
| Topic | Description |
|---|---|
| Automations | All schedules and sensors across every code location |
| Automation Conditions | How asset auto-materialization works |
| Adding an Integration | Step-by-step guide for new data sources |
| dbt Conventions | Model naming, contracts, and testing standards |
| IO Managers | How intermediate data is stored in GCS |
| Fiscal Year & Partitioning | Partition strategy for historical loads |
πΊοΈ Guides & Troubleshooting
| Topic | Description |
|---|---|
| Dagster Guide | Tableau scheduling, backfills, branch deployments |
| Google Sheets & Forms | Adding and updating Google Sheets sources |
| Troubleshooting: Dagster | Pipeline failures, partitions, unsynced views |
| Troubleshooting: dbt | Contract violations, compilation errors, test failures |
| Troubleshooting: VS Code | Interpreter, secrets, Trunk, container issues |