Production Ready ( Quality, performance, and the lessons learned shipping to 150 stores )
THE JOURNEY SO FAR: We chose dbt over custom scripts. Built observability with freshness checks and Elementary. Optimized with incremental models and snapshots. Scaled with macros and repeatable patterns. Now it's time to ship to production and learn what that really means.
The First Production Failure
2am on a Wednesday
The page went out: "Critical data quality failure in production." Ten thousand orders from our biggest store showed NULL for customer IDs. The relationships test failed. Marketing's morning campaign was blocked. Revenue reports were broken.
Root cause? The source system changed a column from customer_id to customerId (camelCase). Our staging model still looked for customer_id. It found nothing. We loaded nulls.
Lesson: Tests saved us from shipping bad data downstream, but we should have caught the schema change earlier. We added dbt source freshness checks and schema change alerts to Elementary that night.
Production teaches lessons that development can't. Here's what we learned.
Data Quality: Tests Are Not Optional
In development, tests feel like extra work. In production, they're the difference between catching problems at 2am versus explaining them to the CEO at 9am.
The Testing Hierarchy
not_null - Primary keys, foreign keys, required business fields. If it should always have a value, test it.
unique - Order IDs, customer IDs, any identifier. Duplicates cause double-counting and broken joins.
relationships - Referential integrity. Every order_items.order_id must exist in orders.order_id.
Custom Tests - Business rules. "Revenue can't be negative." "Order dates can't be in the future." Domain-specific validations.
Here's what our test configuration looks like for a critical mart:
models:- name: mart_daily_store_salescolumns:- name: order_datetests:- not_null- dbt_utils.expression_is_true:expression: "<= current_date()"- name: store_idtests:- not_null- relationships:to: ref('dim_stores')field: store_id- name: total_revenuetests:- not_null- dbt_utils.expression_is_true:expression: ">= 0"
These tests run after every dbt run. If any fail, the job fails. We catch bad data before it reaches dashboards.
Performance: The Optimization Journey
Our first full production run took six hours. After optimization, it takes 22 minutes. Here's how we got there.
BEFORE OPTIMIZATION
6h Full refresh on every table, every run
AFTER OPTIMIZATION
22m Incremental + selective + parallel
Strategy 1: Incremental Models
We covered this in Episode 2, but it's worth repeating: incremental models were our biggest performance win. Facts tables like orders and order_items process only changes, not the full history.
Strategy 2: Selective Runs with State
In development, we rarely need to rebuild everything. The --select flag with state comparison is magic:
# Only run models that changed since last production rundbt run --select state:modified+# Only run a specific mart and its dependenciesdbt run --select +mart_daily_store_sales# Only run models tagged 'daily'dbt run --select tag:daily
For scheduled jobs, we run modified models on code deploys and tagged models on schedules. Daily jobs run tag:daily. Weekly aggregations run tag:weekly.
Strategy 3: Parallelism
dbt can run independent models in parallel. We configured threads: 8 in our profile. Models at the same layer (e.g., all staging models) run simultaneously, limited only by database capacity.
Watching eight staging models turn green at once never gets old.
Strategy 4: Indexes on Large Tables
We add indexes in post_hook configurations on columns used for filtering and joining:
{{ config(materialized='table',schema=var('target_schema'),post_hook=["alter table {{ this }}add index idx_store_date (store_id, order_date)","alter table {{ this }}add index idx_status (order_status)"]) }}
Query performance on these tables improved 10x for common filters.
Troubleshooting: The Debug Toolkit
When things break in production (and they will), here's the toolkit that saves us:
dbt compile - Inspect the actual SQL dbt will run. Check for variable substitutions, macro expansions, Jinja logic. 90% of "it doesn't work" is "the SQL isn't what I thought."
dbt debug - Verify connections, profiles, and configurations. When a model won't run, start here to eliminate environmental issues.
Elementary dashboards - See which models failed, which tests failed, and trends over time. "This test started failing three days ago" is a powerful diagnostic.
Logs and lineage - logs/dbt.log contains every query dbt ran. The lineage graph in documentation shows impact: if stg_orders fails, what downstream models are blocked?
Manual query validation - When all else fails, copy the compiled SQL from target/compiled/, run it manually in MySQL, and see the actual error message.
Hard-Won Lessons
The MySQL Schema Enforcement Saga
We wanted to use dbt's contract: {enforced: true} to validate column types match our schema definitions. On MySQL, we hit limitations: not all constraints are supported, and we needed custom indexes and foreign keys.
After three failed attempts, we settled on contract: {enforced: false} plus ALTER TABLE post-hooks to enforce the schema we needed.
Lesson: Don't fight your database's limitations. Use the tools dbt provides (contracts, tests, docs), but augment with database-specific solutions when needed. Pragmatism beats purity.
The Variable Naming Incident
We had source_schema and target_schema as variables. A developer added source_db for database name. They didn't update the README. Three people spent four hours debugging why models wouldn't compile before discovering the missing variable.
Lesson: Document your variables in dbt_project.yml with defaults and descriptions. Make --vars usage obvious in your README. Future teammates (including future you) will thank you.
What We Built: A Production-Ready Framework
The Results
• 150 stores processed with a single dbt project
• 22-minute runs processing millions of daily transactions
• 200+ data quality tests protecting every critical model
• Zero manual intervention for routine runs (though we monitor everything)
• Complete lineage from source tables to executive dashboards
• Historical snapshots preserving customer and product changes over time
• Automatic alerts for freshness, test failures, and schema changes
More importantly, we built a team culture around data quality. Tests aren't optional. Documentation isn't an afterthought. Pull requests include SQL review. Everyone understands the lineage of their models.
New engineers onboard in days instead of weeks because the project is self-documenting. We add new business entities (promotions, returns, inventory) by following the same pattern every time.
The Philosophy That Emerged
What We'd Tell Ourselves a Year Ago
Start Simple - Don't build for 150 stores on day one. Build for three stores really well. Patterns that work at small scale guide scaling decisions.
Test Everything - If it matters, test it. Primary keys, foreign keys, business rules. Tests are documentation that runs. They're proof your data is correct.
Document as You Go - Future you won't remember why you made that choice. Present you should write it down in the model's description field.
Macros Save Time - But write them when you need them, not because they're cool. Premature abstraction is as bad as copy-paste.
Observability Isn't Optional - You can't fix what you can't see. Freshness checks, test monitoring, and alerting should launch with your first production run.
Incremental > Perfect - Ship the staging models. Ship basic tests. Ship documentation with TODOs. Iterate in production beats perfecting in development.
The End of the Beginning
This blog series documented our journey from retail data chaos to a production-grade dbt pipeline. We migrated 150 store databases into a unified analytics warehouse. We built observability, performance, quality, and scalability into every layer.
But honestly? We're still learning. Every new business requirement teaches us something. Every production incident makes the system stronger. Every new team member brings fresh perspectives.
dbt gave us the framework. We built the practices, patterns, and culture on top of it. That combination good tools plus thoughtful practices is what makes data engineering sustainable at scale.
Series Complete
Thanks for following along on this journey from chaos to production-ready data pipelines.
The dbt Migration Chronicles · Episode 4 of 4 — Series Finale
Written for data teams who learned these lessons the hard way, so you don't have to
Production Ready ( Quality, performance, and the lessons learned shipping to 150 stores )
We chose dbt over custom scripts, built observability, optimized performance, and shipped to production...
Read MoreScaling from 15 to 150 Stores ( When copy-paste becomes technical debt, macros become salvation )
We built a pipeline with observability, incremental models for performance, and snapshots for history. Our 15-store deployment ran smoothly...
Read MoreKeeping Your Data Fresh: ( The wake-up call at 3am that taught us about observability )
That morning taught us a crucial lesson: a successful dbt run doesn't mean your data is fresh, accurate, or complete. You need observability.
Read MoreRetail Data Chaos: How We Found Our Way Out ( When spreadsheetsfail and databases multiply, where do you turn? )
Picture this: You're managing data for a growing retail chain. Store afterstore opens New York, San Francisco, Los Angeles—each with its own MySQL database...
Read MoreSecuring Your AI-Powered Future (How Authorization Ensures Safe and Appropriate Access)
Discover how authorization in MCP ensures secure, role-based access for AI-powered business workflows...
Read MoreProtecting Your AI-Powered Systems (How Rate Limiting Ensures Stability and Performance)
MCP connects AI to your applications (Episode 1) and enables powerful self-service analytics (Episode 2)...
Read MoreAI-Powered Analytics (How MCP Enables Self-Service Reporting Without Developers)
One of the most powerful applications of MCP is enabling self-service analytics. Product owners, managers, and business analysts...
Read MoreAI Meets Your Applications (What is MCP and Why Your Business Needs It Now)
Traditional application programming interfaces (APIs) have served us well, but they require technical knowledge. Developers need to understand endpoints...
Read MoreWhy Building the Right MVP Architecture No Longer Slows You Down
Just build a simple monolith for your MVP. You can fix the architecture later...
Read More