linkedin insight
Omax Tech

Loading...

Scalable data pipeline diagram highlighting dbt macros, reusable models, and multi-store analytics flow.

Scaling from 15 to 150 Stores ( When copy-paste becomes technical debt, macros become salvation )

Data Engineering
April 08, 2026
8-10 min

Share blog

Previously on the dbt migration chronicles: We built a pipeline with observability, incremental models for performance, and snapshots for history. Our 15-store deployment ran smoothly. Then the business asked us to scale to 50 stores by next quarter.

The Copy-Paste Nightmare

It started innocently enough. We had working code for adding foreign key constraints to our order_items table. When we needed the same thing for order_payments, someone copied the SQL. Then order_shipments needed it. Then customer_addresses.

By month three, we had the same 20-line block of SQL repeated in 47 different places.

The Breaking Point: MySQL changed how they recommend checking for existing constraints. We needed to update our pattern. That meant finding and fixing 47 scattered copies. Three developers spent two days on the update. Two bugs slipped through. One caused a production failure at 2am.

There had to be a better way.

Enter Macros: Write Once, Use Everywhere

dbt macros are functions for SQL. Write logic once, call it everywhere. No copy-paste. No drift. One source of truth.

Here's the macro that saved us hours of debugging:

javascript
{% macro retail_create_fk_if_not_exists(
constraint_name,
table_name,
column_name,
ref_table,
ref_column
) %}
{% set check_query %}
select count(*) as cnt
from information_schema.KEY_COLUMN_USAGE
where constraint_schema = '{{ target.schema }}'
and constraint_name = '{{ constraint_name }}'
and table_name = '{{ table_name }}'
{% endset %}
{% set result = run_query(check_query) %}
{% if result and result[0][0] == 0 %}
alter table {{ target.schema }}.{{ table_name }}
add constraint {{ constraint_name }}
foreign key ({{ column_name }})
references {{ target.schema }}.{{ ref_table }}({{ ref_column }});
{% endif %}
{% endmacro %}

Now instead of 20 lines of repeated SQL, we write:

javascript
{{ config(
post_hook=[
"{{ retail_create_fk_if_not_exists(
'fk_order_items_order',
'order_items',
'order_id',
'orders',
'order_id'
) }}"
]
) }}

Clean. Maintainable. When MySQL recommendations changed, we updated one macro and every model using it got the fix automatically.

Validation Macros: Catching Problems Early

We built custom validation macros to compare source and warehouse row counts. Before snapshots run, we verify that all orders from retail_store_ny made it into analytics_retail. Discrepancies trigger alerts, not silent data loss.

The DRY Principle in Action

Don't Repeat Yourself isn't just a coding philosophy it's survival at scale. Every repeated block of code is a future bug waiting to happen. Macros turn repetition into reusability.

• Maintainability: Fix once, fix everywhere

• Consistency: Same logic produces same results

• Testability: Test the macro, trust all uses of it

• Readability: Descriptive names beat walls of SQL

Project Management: Organizing for Scale

With 15 stores, we could get away with informal processes. At 50 stores processing hundreds of millions of rows, we needed structure.

The Git Workflow That Saved Us

Every model change goes through pull requests. Every PR includes:

• The SQL change (obviously)

• Updated tests (not_null, unique, relationships)

• Documentation updates in schema.yml

• dbt compile output showing what SQL will actually run

We run dbt compile locally to catch syntax errors before committing. We run dbt list --select state:modified+ in CI to show which models our change affects. No surprises.

The Tools That Keep Us Sane

dbt docs generate: We regenerate documentation weekly. Lineage graphs show how data flows from source to mart. New team members explore the docs before writing code.

dbt Cloud (optional): For teams without a data orchestrator like Airflow, dbt Cloud handles scheduling, monitoring, and job history. We use Airflow, but the point is: don't run dbt manually in production.

Branching strategy: Development branches deploy to a dev schema. Production branches deploy to the real warehouse. We test on synthetic data before touching live customer information.

Extending to New Business Entities

Six months in, the product team launched a new feature: promotions and discount codes. Marketing needed analytics on promotion performance. We had three weeks to deliver.

We didn't panic. We had a pattern.

1. Analyze the source table

Examined the promotions table in a sample store database. Documented columns, keys, and relationships to orders.

2. Create staging model

Built stg_promotions to normalize column names and cast types. Stayed close to the source.

3. Build intermediate logic

Created int_orders_with_promotions to join orders with active promotions based on date ranges and discount codes.

4. Design the mart

Delivered mart_promotion_performance showing redemption rates, revenue impact, and customer segments for each promotion.

5. Add tests and docs

Wrote tests for uniqueness, null checks, and valid date ranges. Documented in schema.yml.

6. Deploy incrementally

Ran for one store first. Validated results with marketing. Then rolled out to all 50 stores using our existing variable pattern.

The staging model looked like this:

javascript
-- models/staging/stg_promotions.sql
{{ config(materialized='view') }}
select
id as promotion_id,
code as promotion_code,
start_date,
end_date,
discount_percent,
created_at,
updated_at
from {{ source('retail_source', 'promotions') }}

Clean. Simple. Following the exact pattern we'd established for orders and customers. No reinventing the wheel.

The Power of Parameterization

Every model we write uses var('target_schema') instead of hardcoded schema names.This means:

javascript
{{ config(
schema=var('target_schema')
) }}
-- Run for any store:
dbt run --target prod --vars "{
source_schema: retail_store_la,
target_schema: analytics_retail
}"

We write the model once. We run it for 150 different stores just by changing the variables. Same code. Same quality. Same tests. Zero copy-paste.

Lessons from Scaling

• Abstraction beats repetition: Macros centralize logic

• Patterns beat improvisation: Staging → Intermediate → Marts works every time

• Testing beats hoping: Automated tests catch regressions before production

• Documentation beats archaeology: Future you will thank present you

• Parameterization beats duplication: Variables make one codebase serve many tenants

Coming Up Next

We've built a scalable, maintainable pipeline. But scaling brings new challenges: performance optimization, data quality at scale, and troubleshooting complex failures. In our final episode, we'll cover production hardening, performance tuning, and the lessons we learned shipping dbt to 150 stores.

Continue to Episode 4.

The dbt Migration Chronicles · Episode 3 of 4

Written for teams learning to scale without breaking things

Blogs

Discover the latest insights and trends in technology with the Omax Tech Blog.

View All Blogs
Data operations dashboard showing production quality checks, performance trends, and incident alerts across stores.
8-10 min
April 09, 2026

Production Ready ( Quality, performance, and the lessons learned shipping to 150 stores )

We chose dbt over custom scripts, built observability, optimized performance, and shipped to production...

Read More
Scalable data pipeline diagram highlighting dbt macros, reusable models, and multi-store analytics flow.
8-10 min
April 08, 2026

Scaling from 15 to 150 Stores ( When copy-paste becomes technical debt, macros become salvation )

We built a pipeline with observability, incremental models for performance, and snapshots for history. Our 15-store deployment ran smoothly...

Read More
Observability dashboard tracking source freshness, pipeline status, and real-time data quality alerts.
8-10 min
April 07, 2026

Keeping Your Data Fresh: ( The wake-up call at 3am that taught us about observability )

That morning taught us a crucial lesson: a successful dbt run doesn't mean your data is fresh, accurate, or complete. You need observability.

Read More
Retail data architecture visual showing fragmented store databases consolidated into a unified analytics pipeline.
8-10 min
April 06, 2026

Retail Data Chaos: How We Found Our Way Out ( When spreadsheetsfail and databases multiply, where do you turn? )

Picture this: You're managing data for a growing retail chain. Store afterstore opens New York, San Francisco, Los Angeles—each with its own MySQL database...

Read More
Secure AI access workflow showing authentication, authorization, and protected enterprise operations.
8-10 min
April 07, 2026

Securing Your AI-Powered Future (How Authorization Ensures Safe and Appropriate Access)

Discover how authorization in MCP ensures secure, role-based access for AI-powered business workflows...

Read More
AI security dashboard visualizing request throttling, traffic control, and system protection metrics.
6-8 min
April 06, 2026

Protecting Your AI-Powered Systems (How Rate Limiting Ensures Stability and Performance)

MCP connects AI to your applications (Episode 1) and enables powerful self-service analytics (Episode 2)...

Read More
AI dashboard visual showing analytics insights, charts, and automated business reporting.
6-8 min
April 05, 2026

AI-Powered Analytics (How MCP Enables Self-Service Reporting Without Developers)

One of the most powerful applications of MCP is enabling self-service analytics. Product owners, managers, and business analysts...

Read More
Futuristic AI robot on a digital platform representing artificial intelligence and automation.
6-8 min
April 04, 2026

AI Meets Your Applications (What is MCP and Why Your Business Needs It Now)

Traditional application programming interfaces (APIs) have served us well, but they require technical knowledge. Developers need to understand endpoints...

Read More
Startup MVP architecture illustration with rocket and analytics icons.
6-8 min
Feb 25, 2026

Why Building the Right MVP Architecture No Longer Slows You Down

Just build a simple monolith for your MVP. You can fix the architecture later...

Read More

Get In Touch

Build Your Next Big Idea with Us

From MVPs to full-scale applications, we help you bring your vision to life on time and within budget. Our expert team delivers scalable, high-quality software tailored to your business goals.