Data operations dashboard showing production quality checks, performance trends, and incident alerts across stores.

Production Ready ( Quality, performance, and the lessons learned shipping to 150 stores )

Data Engineering

April 09, 2026

8-10 min

Share blog

THE JOURNEY SO FAR: We chose dbt over custom scripts. Built observability with freshness checks and Elementary. Optimized with incremental models and snapshots. Scaled with macros and repeatable patterns. Now it's time to ship to production and learn what that really means.

The First Production Failure

2am on a Wednesday

The page went out: "Critical data quality failure in production." Ten thousand orders from our biggest store showed NULL for customer IDs. The relationships test failed. Marketing's morning campaign was blocked. Revenue reports were broken.

Root cause? The source system changed a column from customer_id to customerId (camelCase). Our staging model still looked for customer_id. It found nothing. We loaded nulls.

Lesson: Tests saved us from shipping bad data downstream, but we should have caught the schema change earlier. We added dbt source freshness checks and schema change alerts to Elementary that night.

Production teaches lessons that development can't. Here's what we learned.

Data Quality: Tests Are Not Optional

In development, tests feel like extra work. In production, they're the difference between catching problems at 2am versus explaining them to the CEO at 9am.

The Testing Hierarchy

not_null - Primary keys, foreign keys, required business fields. If it should always have a value, test it.

unique - Order IDs, customer IDs, any identifier. Duplicates cause double-counting and broken joins.

relationships - Referential integrity. Every order_items.order_id must exist in orders.order_id.

Custom Tests - Business rules. "Revenue can't be negative." "Order dates can't be in the future." Domain-specific validations.

Here's what our test configuration looks like for a critical mart:

javascript

models:
- name: mart_daily_store_sales
  columns:
    - name: order_date
      tests:
        - not_null
        - dbt_utils.expression_is_true:
            expression: "<= current_date()"
    - name: store_id
      tests:
        - not_null
        - relationships:
            to: ref('dim_stores')
            field: store_id
    - name: total_revenue
      tests:
        - not_null
        - dbt_utils.expression_is_true:
            expression: ">= 0"

These tests run after every dbt run. If any fail, the job fails. We catch bad data before it reaches dashboards.

Performance: The Optimization Journey

Our first full production run took six hours. After optimization, it takes 22 minutes. Here's how we got there.

BEFORE OPTIMIZATION

6h Full refresh on every table, every run

AFTER OPTIMIZATION

22m Incremental + selective + parallel

Strategy 1: Incremental Models

We covered this in Episode 2, but it's worth repeating: incremental models were our biggest performance win. Facts tables like orders and order_items process only changes, not the full history.

Strategy 2: Selective Runs with State

In development, we rarely need to rebuild everything. The --select flag with state comparison is magic:

javascript

# Only run models that changed since last production run
dbt run --select state:modified+
# Only run a specific mart and its dependencies
dbt run --select +mart_daily_store_sales
# Only run models tagged 'daily'
dbt run --select tag:daily

For scheduled jobs, we run modified models on code deploys and tagged models on schedules. Daily jobs run tag:daily. Weekly aggregations run tag:weekly.

Strategy 3: Parallelism

dbt can run independent models in parallel. We configured threads: 8 in our profile. Models at the same layer (e.g., all staging models) run simultaneously, limited only by database capacity.

Watching eight staging models turn green at once never gets old.

Strategy 4: Indexes on Large Tables

We add indexes in post_hook configurations on columns used for filtering and joining:

javascript

{{ config(
materialized='table',
schema=var('target_schema'),
post_hook=[
"alter table {{ this }}
add index idx_store_date (store_id, order_date)",
"alter table {{ this }}
add index idx_status (order_status)"
]
) }}

Query performance on these tables improved 10x for common filters.

Troubleshooting: The Debug Toolkit

When things break in production (and they will), here's the toolkit that saves us:

dbt compile - Inspect the actual SQL dbt will run. Check for variable substitutions, macro expansions, Jinja logic. 90% of "it doesn't work" is "the SQL isn't what I thought."

dbt debug - Verify connections, profiles, and configurations. When a model won't run, start here to eliminate environmental issues.

Elementary dashboards - See which models failed, which tests failed, and trends over time. "This test started failing three days ago" is a powerful diagnostic.

Logs and lineage - logs/dbt.log contains every query dbt ran. The lineage graph in documentation shows impact: if stg_orders fails, what downstream models are blocked?

Manual query validation - When all else fails, copy the compiled SQL from target/compiled/, run it manually in MySQL, and see the actual error message.

Hard-Won Lessons

The MySQL Schema Enforcement Saga

We wanted to use dbt's contract: {enforced: true} to validate column types match our schema definitions. On MySQL, we hit limitations: not all constraints are supported, and we needed custom indexes and foreign keys.

After three failed attempts, we settled on contract: {enforced: false} plus ALTER TABLE post-hooks to enforce the schema we needed.

Lesson: Don't fight your database's limitations. Use the tools dbt provides (contracts, tests, docs), but augment with database-specific solutions when needed. Pragmatism beats purity.

The Variable Naming Incident

We had source_schema and target_schema as variables. A developer added source_db for database name. They didn't update the README. Three people spent four hours debugging why models wouldn't compile before discovering the missing variable.

Lesson: Document your variables in dbt_project.yml with defaults and descriptions. Make --vars usage obvious in your README. Future teammates (including future you) will thank you.

What We Built: A Production-Ready Framework

The Results

• 150 stores processed with a single dbt project

• 22-minute runs processing millions of daily transactions

• 200+ data quality tests protecting every critical model

• Zero manual intervention for routine runs (though we monitor everything)

• Complete lineage from source tables to executive dashboards

• Historical snapshots preserving customer and product changes over time

• Automatic alerts for freshness, test failures, and schema changes

More importantly, we built a team culture around data quality. Tests aren't optional. Documentation isn't an afterthought. Pull requests include SQL review. Everyone understands the lineage of their models.

New engineers onboard in days instead of weeks because the project is self-documenting. We add new business entities (promotions, returns, inventory) by following the same pattern every time.

The Philosophy That Emerged

What We'd Tell Ourselves a Year Ago

Start Simple - Don't build for 150 stores on day one. Build for three stores really well. Patterns that work at small scale guide scaling decisions.

Test Everything - If it matters, test it. Primary keys, foreign keys, business rules. Tests are documentation that runs. They're proof your data is correct.

Document as You Go - Future you won't remember why you made that choice. Present you should write it down in the model's description field.

Macros Save Time - But write them when you need them, not because they're cool. Premature abstraction is as bad as copy-paste.

Observability Isn't Optional - You can't fix what you can't see. Freshness checks, test monitoring, and alerting should launch with your first production run.

Incremental > Perfect - Ship the staging models. Ship basic tests. Ship documentation with TODOs. Iterate in production beats perfecting in development.

The End of the Beginning

This blog series documented our journey from retail data chaos to a production-grade dbt pipeline. We migrated 150 store databases into a unified analytics warehouse. We built observability, performance, quality, and scalability into every layer.

But honestly? We're still learning. Every new business requirement teaches us something. Every production incident makes the system stronger. Every new team member brings fresh perspectives.

dbt gave us the framework. We built the practices, patterns, and culture on top of it. That combination good tools plus thoughtful practices is what makes data engineering sustainable at scale.

Series Complete

Thanks for following along on this journey from chaos to production-ready data pipelines.

The dbt Migration Chronicles · Episode 4 of 4 — Series Finale

Written for data teams who learned these lessons the hard way, so you don't have to

Start from Episode 1.

Blogs

Discover the latest insights and trends in technology with the Omax Tech Blog.

View All Blogs

Futuristic cloud computing illustration with glowing data and AI-powered server floating in a digital neon environment.

Abid Ali

6-10 min

June 22, 2026

AWS Migration Checklist: A Practical Roadmap for Modern Businesses

Migrating businesses to AWS offers many benefits, including cost optimization, improved security, and greater scalability. However, a successful migration requires careful planning and execution. Otherwise, organizations may experience...

Agentic AI + MCP: The Future of QA Testing

Abdullah Bin Hussain

10-15 min

June 09, 2026

Agentic AI for QA & Software Testing with MCP Servers

For years, QA engineers have relied heavily on manual testing, repetitive validation, documentation, and traditional automation scripts But now, a new era of testing...

Responsive web development illustration showing cross-device software design on laptop, tablet, and mobile screens.

Usman Baig

6-8 min

April 20, 2026

Our Proven Web Development Process That Delivers Real Results

In software development, success does not come from coding alone. Real results come from understanding business needs, planning the right workflow, building user-friendly designs...

Secure AWS Systems Manager connectivity illustration showing private cloud access to servers and databases without SSH exposure.

Umer Khan

6-8 min

April 20, 2026

Secure AWS Connectivity Using AWS Systems Manager (SSM)

In traditional cloud architectures, secure access to private resources such as databases and internal servers often relies on...

Cloud upload architecture illustration showing secure multi-account AWS infrastructure for enterprise environments.

Umer Khan

6-10 min

April 19, 2026

Building a Secure Multi-Account AWS Architecture for Enterprise Environments (Dev, STG, UAT, Prod)

In today’s cloud-first world, scalability and speed are no longer enough security, governance, and cost control are equally critical...

Friendly AI assistant robot beside a smartphone, representing adaptive AI agents for modern workflows.

Zohaib Anwar

6-8 min

April 15, 2026

Why You Should Use AI Agents Over Single Prompts: Unlocking the Power of Adaptive AI for Complex Workflows

In the world of artificial intelligence (AI), one of the biggest advancements has been the rise of AI agents that adapt dynamically to real-time data and complex workflows...

Yawar Khan

8-10 min

April 09, 2026