Scalable data pipeline diagram highlighting dbt macros, reusable models, and multi-store analytics flow.

Scaling from 15 to 150 Stores ( When copy-paste becomes technical debt, macros become salvation )

Data Engineering

April 08, 2026

8-10 min

Share blog

Previously on the dbt migration chronicles: We built a pipeline with observability, incremental models for performance, and snapshots for history. Our 15-store deployment ran smoothly. Then the business asked us to scale to 50 stores by next quarter.

The Copy-Paste Nightmare

It started innocently enough. We had working code for adding foreign key constraints to our order_items table. When we needed the same thing for order_payments, someone copied the SQL. Then order_shipments needed it. Then customer_addresses.

By month three, we had the same 20-line block of SQL repeated in 47 different places.

The Breaking Point: MySQL changed how they recommend checking for existing constraints. We needed to update our pattern. That meant finding and fixing 47 scattered copies. Three developers spent two days on the update. Two bugs slipped through. One caused a production failure at 2am.

There had to be a better way.

Enter Macros: Write Once, Use Everywhere

dbt macros are functions for SQL. Write logic once, call it everywhere. No copy-paste. No drift. One source of truth.

Here's the macro that saved us hours of debugging:

javascript

{% macro retail_create_fk_if_not_exists(
constraint_name,
table_name,
column_name,
ref_table,
ref_column
) %}
{% set check_query %}
select count(*) as cnt
from information_schema.KEY_COLUMN_USAGE
where constraint_schema = '{{ target.schema }}'
and constraint_name = '{{ constraint_name }}'
and table_name = '{{ table_name }}'
{% endset %}
{% set result = run_query(check_query) %}
{% if result and result[0][0] == 0 %}
alter table {{ target.schema }}.{{ table_name }}
add constraint {{ constraint_name }}
foreign key ({{ column_name }})
references {{ target.schema }}.{{ ref_table }}({{ ref_column }});
{% endif %}
{% endmacro %}

Now instead of 20 lines of repeated SQL, we write:

javascript

{{ config(
post_hook=[
"{{ retail_create_fk_if_not_exists(
'fk_order_items_order',
'order_items',
'order_id',
'orders',
'order_id'
) }}"
]
) }}

Clean. Maintainable. When MySQL recommendations changed, we updated one macro and every model using it got the fix automatically.

Validation Macros: Catching Problems Early

We built custom validation macros to compare source and warehouse row counts. Before snapshots run, we verify that all orders from retail_store_ny made it into analytics_retail. Discrepancies trigger alerts, not silent data loss.

The DRY Principle in Action

Don't Repeat Yourself isn't just a coding philosophy it's survival at scale. Every repeated block of code is a future bug waiting to happen. Macros turn repetition into reusability.

• Maintainability: Fix once, fix everywhere

• Consistency: Same logic produces same results

• Testability: Test the macro, trust all uses of it

• Readability: Descriptive names beat walls of SQL

Project Management: Organizing for Scale

With 15 stores, we could get away with informal processes. At 50 stores processing hundreds of millions of rows, we needed structure.

The Git Workflow That Saved Us

Every model change goes through pull requests. Every PR includes:

• The SQL change (obviously)

• Updated tests (not_null, unique, relationships)

• Documentation updates in schema.yml

• dbt compile output showing what SQL will actually run

We run dbt compile locally to catch syntax errors before committing. We run dbt list --select state:modified+ in CI to show which models our change affects. No surprises.

The Tools That Keep Us Sane

dbt docs generate: We regenerate documentation weekly. Lineage graphs show how data flows from source to mart. New team members explore the docs before writing code.

dbt Cloud (optional): For teams without a data orchestrator like Airflow, dbt Cloud handles scheduling, monitoring, and job history. We use Airflow, but the point is: don't run dbt manually in production.

Branching strategy: Development branches deploy to a dev schema. Production branches deploy to the real warehouse. We test on synthetic data before touching live customer information.

Extending to New Business Entities

Six months in, the product team launched a new feature: promotions and discount codes. Marketing needed analytics on promotion performance. We had three weeks to deliver.

We didn't panic. We had a pattern.

1. Analyze the source table

Examined the promotions table in a sample store database. Documented columns, keys, and relationships to orders.

2. Create staging model

Built stg_promotions to normalize column names and cast types. Stayed close to the source.

3. Build intermediate logic

Created int_orders_with_promotions to join orders with active promotions based on date ranges and discount codes.

4. Design the mart

Delivered mart_promotion_performance showing redemption rates, revenue impact, and customer segments for each promotion.

5. Add tests and docs

Wrote tests for uniqueness, null checks, and valid date ranges. Documented in schema.yml.

6. Deploy incrementally

Ran for one store first. Validated results with marketing. Then rolled out to all 50 stores using our existing variable pattern.

The staging model looked like this:

javascript

-- models/staging/stg_promotions.sql
{{ config(materialized='view') }}
select
id as promotion_id,
code as promotion_code,
start_date,
end_date,
discount_percent,
created_at,
updated_at
from {{ source('retail_source', 'promotions') }}

Clean. Simple. Following the exact pattern we'd established for orders and customers. No reinventing the wheel.

The Power of Parameterization

Every model we write uses var('target_schema') instead of hardcoded schema names.This means:

javascript

{{ config(
schema=var('target_schema')
) }}
-- Run for any store:
dbt run --target prod --vars "{
source_schema: retail_store_la,
target_schema: analytics_retail
}"

We write the model once. We run it for 150 different stores just by changing the variables. Same code. Same quality. Same tests. Zero copy-paste.

Lessons from Scaling

• Abstraction beats repetition: Macros centralize logic

• Patterns beat improvisation: Staging → Intermediate → Marts works every time

• Testing beats hoping: Automated tests catch regressions before production

• Documentation beats archaeology: Future you will thank present you

• Parameterization beats duplication: Variables make one codebase serve many tenants

Coming Up Next

We've built a scalable, maintainable pipeline. But scaling brings new challenges: performance optimization, data quality at scale, and troubleshooting complex failures. In our final episode, we'll cover production hardening, performance tuning, and the lessons we learned shipping dbt to 150 stores.

Continue to Episode 4.

The dbt Migration Chronicles · Episode 3 of 4

Written for teams learning to scale without breaking things

Blogs

Discover the latest insights and trends in technology with the Omax Tech Blog.

View All Blogs

Futuristic cloud computing illustration with glowing data and AI-powered server floating in a digital neon environment.

Abid Ali

6-10 min

June 22, 2026

AWS Migration Checklist: A Practical Roadmap for Modern Businesses

Migrating businesses to AWS offers many benefits, including cost optimization, improved security, and greater scalability. However, a successful migration requires careful planning and execution. Otherwise, organizations may experience...

Agentic AI + MCP: The Future of QA Testing

Abdullah Bin Hussain

10-15 min

June 09, 2026

Agentic AI for QA & Software Testing with MCP Servers

For years, QA engineers have relied heavily on manual testing, repetitive validation, documentation, and traditional automation scripts But now, a new era of testing...

Responsive web development illustration showing cross-device software design on laptop, tablet, and mobile screens.

Usman Baig

6-8 min

April 20, 2026

Our Proven Web Development Process That Delivers Real Results

In software development, success does not come from coding alone. Real results come from understanding business needs, planning the right workflow, building user-friendly designs...

Secure AWS Systems Manager connectivity illustration showing private cloud access to servers and databases without SSH exposure.

Umer Khan

6-8 min

April 20, 2026

Secure AWS Connectivity Using AWS Systems Manager (SSM)

In traditional cloud architectures, secure access to private resources such as databases and internal servers often relies on...

Cloud upload architecture illustration showing secure multi-account AWS infrastructure for enterprise environments.

Umer Khan

6-10 min

April 19, 2026

Building a Secure Multi-Account AWS Architecture for Enterprise Environments (Dev, STG, UAT, Prod)

In today’s cloud-first world, scalability and speed are no longer enough security, governance, and cost control are equally critical...

Friendly AI assistant robot beside a smartphone, representing adaptive AI agents for modern workflows.

Zohaib Anwar

6-8 min

April 15, 2026

Why You Should Use AI Agents Over Single Prompts: Unlocking the Power of Adaptive AI for Complex Workflows

In the world of artificial intelligence (AI), one of the biggest advancements has been the rise of AI agents that adapt dynamically to real-time data and complex workflows...

Data operations dashboard showing production quality checks, performance trends, and incident alerts across stores.

Yawar Khan

8-10 min

April 09, 2026

Production Ready ( Quality, performance, and the lessons learned shipping to 150 stores )

We chose dbt over custom scripts, built observability, optimized performance, and shipped to production...

Yawar Khan

8-10 min

April 08, 2026