Scaling from 15 to 150 Stores ( When copy-paste becomes technical debt, macros become salvation )
Previously on the dbt migration chronicles: We built a pipeline with observability, incremental models for performance, and snapshots for history. Our 15-store deployment ran smoothly. Then the business asked us to scale to 50 stores by next quarter.
The Copy-Paste Nightmare
It started innocently enough. We had working code for adding foreign key constraints to our order_items table. When we needed the same thing for order_payments, someone copied the SQL. Then order_shipments needed it. Then customer_addresses.
By month three, we had the same 20-line block of SQL repeated in 47 different places.
The Breaking Point: MySQL changed how they recommend checking for existing constraints. We needed to update our pattern. That meant finding and fixing 47 scattered copies. Three developers spent two days on the update. Two bugs slipped through. One caused a production failure at 2am.
There had to be a better way.
Enter Macros: Write Once, Use Everywhere
dbt macros are functions for SQL. Write logic once, call it everywhere. No copy-paste. No drift. One source of truth.
Here's the macro that saved us hours of debugging:
{% macro retail_create_fk_if_not_exists(constraint_name,table_name,column_name,ref_table,ref_column) %}{% set check_query %}select count(*) as cntfrom information_schema.KEY_COLUMN_USAGEwhere constraint_schema = '{{ target.schema }}'and constraint_name = '{{ constraint_name }}'and table_name = '{{ table_name }}'{% endset %}{% set result = run_query(check_query) %}{% if result and result[0][0] == 0 %}alter table {{ target.schema }}.{{ table_name }}add constraint {{ constraint_name }}foreign key ({{ column_name }})references {{ target.schema }}.{{ ref_table }}({{ ref_column }});{% endif %}{% endmacro %}
Now instead of 20 lines of repeated SQL, we write:
{{ config(post_hook=["{{ retail_create_fk_if_not_exists('fk_order_items_order','order_items','order_id','orders','order_id') }}"]) }}
Clean. Maintainable. When MySQL recommendations changed, we updated one macro and every model using it got the fix automatically.
Validation Macros: Catching Problems Early
We built custom validation macros to compare source and warehouse row counts. Before snapshots run, we verify that all orders from retail_store_ny made it into analytics_retail. Discrepancies trigger alerts, not silent data loss.
The DRY Principle in Action
Don't Repeat Yourself isn't just a coding philosophy it's survival at scale. Every repeated block of code is a future bug waiting to happen. Macros turn repetition into reusability.
• Maintainability: Fix once, fix everywhere
• Consistency: Same logic produces same results
• Testability: Test the macro, trust all uses of it
• Readability: Descriptive names beat walls of SQL
Project Management: Organizing for Scale
With 15 stores, we could get away with informal processes. At 50 stores processing hundreds of millions of rows, we needed structure.
The Git Workflow That Saved Us
Every model change goes through pull requests. Every PR includes:
• The SQL change (obviously)
• Updated tests (not_null, unique, relationships)
• Documentation updates in schema.yml
• dbt compile output showing what SQL will actually run
We run dbt compile locally to catch syntax errors before committing. We run dbt list --select state:modified+ in CI to show which models our change affects. No surprises.
The Tools That Keep Us Sane
dbt docs generate: We regenerate documentation weekly. Lineage graphs show how data flows from source to mart. New team members explore the docs before writing code.
dbt Cloud (optional): For teams without a data orchestrator like Airflow, dbt Cloud handles scheduling, monitoring, and job history. We use Airflow, but the point is: don't run dbt manually in production.
Branching strategy: Development branches deploy to a dev schema. Production branches deploy to the real warehouse. We test on synthetic data before touching live customer information.
Extending to New Business Entities
Six months in, the product team launched a new feature: promotions and discount codes. Marketing needed analytics on promotion performance. We had three weeks to deliver.
We didn't panic. We had a pattern.
1. Analyze the source table
Examined the promotions table in a sample store database. Documented columns, keys, and relationships to orders.
2. Create staging model
Built stg_promotions to normalize column names and cast types. Stayed close to the source.
3. Build intermediate logic
Created int_orders_with_promotions to join orders with active promotions based on date ranges and discount codes.
4. Design the mart
Delivered mart_promotion_performance showing redemption rates, revenue impact, and customer segments for each promotion.
5. Add tests and docs
Wrote tests for uniqueness, null checks, and valid date ranges. Documented in schema.yml.
6. Deploy incrementally
Ran for one store first. Validated results with marketing. Then rolled out to all 50 stores using our existing variable pattern.
The staging model looked like this:
-- models/staging/stg_promotions.sql{{ config(materialized='view') }}selectid as promotion_id,code as promotion_code,start_date,end_date,discount_percent,created_at,updated_atfrom {{ source('retail_source', 'promotions') }}
Clean. Simple. Following the exact pattern we'd established for orders and customers. No reinventing the wheel.
The Power of Parameterization
Every model we write uses var('target_schema') instead of hardcoded schema names.This means:
{{ config(schema=var('target_schema')) }}-- Run for any store:dbt run --target prod --vars "{source_schema: retail_store_la,target_schema: analytics_retail}"
We write the model once. We run it for 150 different stores just by changing the variables. Same code. Same quality. Same tests. Zero copy-paste.
Lessons from Scaling
• Abstraction beats repetition: Macros centralize logic
• Patterns beat improvisation: Staging → Intermediate → Marts works every time
• Testing beats hoping: Automated tests catch regressions before production
• Documentation beats archaeology: Future you will thank present you
• Parameterization beats duplication: Variables make one codebase serve many tenants
Coming Up Next
We've built a scalable, maintainable pipeline. But scaling brings new challenges: performance optimization, data quality at scale, and troubleshooting complex failures. In our final episode, we'll cover production hardening, performance tuning, and the lessons we learned shipping dbt to 150 stores.
The dbt Migration Chronicles · Episode 3 of 4
Written for teams learning to scale without breaking things
Production Ready ( Quality, performance, and the lessons learned shipping to 150 stores )
We chose dbt over custom scripts, built observability, optimized performance, and shipped to production...
Read MoreScaling from 15 to 150 Stores ( When copy-paste becomes technical debt, macros become salvation )
We built a pipeline with observability, incremental models for performance, and snapshots for history. Our 15-store deployment ran smoothly...
Read MoreKeeping Your Data Fresh: ( The wake-up call at 3am that taught us about observability )
That morning taught us a crucial lesson: a successful dbt run doesn't mean your data is fresh, accurate, or complete. You need observability.
Read MoreRetail Data Chaos: How We Found Our Way Out ( When spreadsheetsfail and databases multiply, where do you turn? )
Picture this: You're managing data for a growing retail chain. Store afterstore opens New York, San Francisco, Los Angeles—each with its own MySQL database...
Read MoreSecuring Your AI-Powered Future (How Authorization Ensures Safe and Appropriate Access)
Discover how authorization in MCP ensures secure, role-based access for AI-powered business workflows...
Read MoreProtecting Your AI-Powered Systems (How Rate Limiting Ensures Stability and Performance)
MCP connects AI to your applications (Episode 1) and enables powerful self-service analytics (Episode 2)...
Read MoreAI-Powered Analytics (How MCP Enables Self-Service Reporting Without Developers)
One of the most powerful applications of MCP is enabling self-service analytics. Product owners, managers, and business analysts...
Read MoreAI Meets Your Applications (What is MCP and Why Your Business Needs It Now)
Traditional application programming interfaces (APIs) have served us well, but they require technical knowledge. Developers need to understand endpoints...
Read MoreWhy Building the Right MVP Architecture No Longer Slows You Down
Just build a simple monolith for your MVP. You can fix the architecture later...
Read More