Skip to content

teamster πŸš›

kipptaf kippnewark kippcamden kippmiami

uv Trunk License: AGPL v3 Contributor Covenant

Photograph taken in 1960. Upload from http://www.fortepan.hu/?lang=en&img=20566, part of Commons:Batch_uploading/Fortepan.HU

Next-gen data orchestration for KIPP TEAM & Family Schools

Teamster is the data engineering platform powering analytics and reporting across KIPP Newark, Camden, Miami, and Paterson. It ingests data from 30+ source systems, transforms it through dbt, and delivers it to Tableau, Google Sheets, PowerSchool, and other consumers β€” all orchestrated by Dagster.

  • 🎻 Dagster β€” orchestrates every ETL step across five code locations, one per school network; observe and run pipelines in Dagster Cloud
  • πŸ”§ dbt β€” transforms raw source data into staging, intermediate, mart, and extract models in Google BigQuery
  • 🚿 dlt β€” loads data from API sources into BigQuery alongside dbt
  • πŸ”€ Airbyte β€” managed connector pipelines for select integrations
  • πŸͺ£ Google Cloud Storage β€” intermediate storage layer between pipeline steps
  • ☸️ Google Kubernetes Engine β€” runs each code location in its own container in production
  • βš™οΈ GitHub Actions β€” CI/CD for building and deploying code locations
  • πŸ“Š Tableau β€” primary BI consumer; Dagster manages workbook extract refreshes

πŸ“– Background

KIPP's data infrastructure was previously a patchwork of Python scripts, cron jobs, stored procedures, Fivetran, and Selenium automation spread across multiple databases. Synchronous scheduling meant a slow pull from one system would cascade into downstream failures. A single data engineer spent more time firefighting than building.

Teamster replaced all of it with a unified, asset-based platform. The results:

  • ⚑ Pipeline development time dropped from weeks to days
  • 🎫 Data-related support tickets fell 30% year-over-year
  • πŸ§‘β€πŸ’» Analysts gained Git, SQL, and DevOps skills through shared PR workflows
  • πŸ”” Real-time Slack alerts replaced reactive debugging

"The visibility into the pipelines is a game changer. We know as soon as something fails and why."

Read the full story in the Dagster case study.

πŸš€ Get started

New to the project? Start here:

  1. Getting Started β€” account setup, Codespaces, local dev
  2. Architecture β€” how the code is organized
  3. Contributing β€” workflow and PR guidelines

πŸ“š Reference

Topic Description
Automations All schedules and sensors across every code location
Automation Conditions How asset auto-materialization works
Adding an Integration Step-by-step guide for new data sources
dbt Conventions Model naming, contracts, and testing standards
IO Managers How intermediate data is stored in GCS
Fiscal Year & Partitioning Partition strategy for historical loads

πŸ—ΊοΈ Guides & Troubleshooting

Topic Description
Dagster Guide Tableau scheduling, backfills, branch deployments
Google Sheets & Forms Adding and updating Google Sheets sources
Troubleshooting: Dagster Pipeline failures, partitions, unsynced views
Troubleshooting: dbt Contract violations, compilation errors, test failures
Troubleshooting: VS Code Interpreter, secrets, Trunk, container issues