Observability dashboard tracking source freshness, pipeline status, and real-time data quality alerts.

Keeping Your Data Fresh: ( The wake-up call at 3am that taught us about observability )

Data Engineering

April 07, 2026

8-10 min

Share blog

Previously on the dbt migration chronicles: We escaped retail data chaos by adopting dbt, structuring our pipeline into staging/intermediate/marts layers, and using variables to handle multiple store databases with a single project. But building the pipeline was just the beginning.

The 3am Wake-Up Call

It was a Tuesday when our VP of Sales noticed something wrong. Monday's revenue numbers looked too low, and the board meeting was in four hours.We dove into the data and discovered the San Francisco store had not sent updates in 18 hours. The pipeline ran successfully on stale data. Nobody knew anything was wrong until it was too late.

That morning taught us a crucial lesson: a successful dbt run does not mean your data is fresh, accurate, or complete. You need observability.

What Actually Is Observability?

In data pipelines, observability means answering three questions at any moment:

Is my source data fresh? When did each retail store last send updates? Are we processing today’s orders or yesterday’s?
Did my transformations succeed? Not just did the job finish, but did it produce valid results that passed quality checks?
How is my pipeline performing? Which models are slow? Which tests fail most often? Where should we optimize?

Source Freshness: Never Be Blindsided Again

dbt has a built-in solution for freshness. You define freshness expectations in your source configuration:

javascript

sources:
- name: retail_source
schema: "{{ var('source_schema') }}"
tables:
- name: orders
loaded_at_field: updated_at
freshness:
warn_after: {count: 30, period: minute}
error_after: {count: 2, period: hour}

Now when we run dbt source freshness, we get clear answers: Is retail_store_sf current? Has retail_store_ny stopped sending data?

With Elementary (an open-source observability tool for dbt) configured, we get Slack alerts when stores go quiet. No more surprises. No more board meetings with stale numbers.

The Performance Problem

6 hours Time to rebuild all orders from 15 stores

Our initial dbt models worked beautifully for the first month. Then Black Friday happened. Orders exploded, and the orders table grew to millions of rows per store.

Our nightly full-refresh runs started missing their morning deadline. A full rebuild across 15 stores took around 6 hours. We needed a smarter approach.

We needed a smarter approach.

Incremental Models: Process Only What Changed

Full refreshes rebuild tables from scratch every time. Incremental models are smarter: they process only new or changed rows.

Here is the transformation that saved our pipeline:

javascript

{{ config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='delete+insert',
schema=var('target_schema')
) }}
select
id as order_id,
customer_id,
store_id,
order_total_cents,
order_status,
updated_at
from {{ source('retail_source', 'orders') }}
{% if is_incremental() %}
where updated_at >= (select max(updated_at) from {{ this }})
{% endif %}

The magic is in the is_incremental() block. On the first run, it processes everything. On subsequent runs, it only grabs rows that changed since the last run.

How It Works

The unique_key identifies which rows to update.

The where clause filters to only recent changes.

The delete+insert strategy removes old versions of changed rows before inserting new ones.

Result: Six-hour runs became 15-minute runs. Same accuracy, 96% less processing time.

The History Problem

Three months into production, marketing asked a question we could not answer: "When this customer made their November purchase, what loyalty tier were they in?"

We had their current tier and the November order, but we had lost the historical state. Customer records get updated. Addresses change. Loyalty tiers evolve. Products get repriced.

We needed to preserve history.

Snapshots: Time Travel for Your Data

dbt snapshots implement Type 2 Slowly Changing Dimensions (SCD): keep all versions of a record with timestamps showing when each version was valid.

javascript

{% snapshot customers_snapshot %}
{{ config(
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at',
target_schema=var('target_schema')
) }}
select
id as customer_id,
email,
loyalty_tier,
city,
updated_at
from {{ source('retail_source', 'customers') }}
{% endsnapshot %}

Now when we run dbt snapshot, dbt checks for changes and preserves history. Each customer record gets dbt_valid_from and dbt_valid_to timestamps.

The power of snapshots: we can now answer questions like "Show me all orders where the customer was in the Gold tier at the time of purchase" by joining orders to snapshot tables and filtering by validity ranges.

Putting It All Together

Our production pipeline now combines all three techniques:

Morning routine: dbt source freshness checks that overnight store feeds arrived. Alerts fire if any store is late.
Transformation run: dbt run executes incremental models for facts (orders, order_items) and full refreshes for small dimensions.
History preservation: dbt snapshot captures changes to customers and products before they are overwritten.
Quality gates: dbt test validates uniqueness, referential integrity, and custom business rules while Elementary dashboards show health metrics and alert patterns.

The result is a pipeline that is fast, reliable, and observable. We catch problems before they reach executives, process millions of rows in minutes, and answer historical questions that were previously impossible.

The next board meeting went differently. Revenue numbers were fresh, accurate, and ready an hour early. When the VP of Sales asked, "What was our customer retention rate for Gold tier members who joined in Q3?" we had the answer in under a minute. Historicalsnapshots made it possible.

Coming Up Next

we need strategies for reusable code, better project organization, and extending to new business entities without chaos. In Episode 3, we'll explore macros, project management patterns, and the art of scaling dbt.

Continue to Episode 3.

The dbt Migration Chronicles · Episode 2 of 4

Written for data teams who learned observability the hard way

Blogs

Discover the latest insights and trends in technology with the Omax Tech Blog.

View All Blogs

Futuristic cloud computing illustration with glowing data and AI-powered server floating in a digital neon environment.

Abid Ali

6-10 min

June 22, 2026

AWS Migration Checklist: A Practical Roadmap for Modern Businesses

Migrating businesses to AWS offers many benefits, including cost optimization, improved security, and greater scalability. However, a successful migration requires careful planning and execution. Otherwise, organizations may experience...

Agentic AI + MCP: The Future of QA Testing

Abdullah Bin Hussain

10-15 min

June 09, 2026

Agentic AI for QA & Software Testing with MCP Servers

For years, QA engineers have relied heavily on manual testing, repetitive validation, documentation, and traditional automation scripts But now, a new era of testing...

Responsive web development illustration showing cross-device software design on laptop, tablet, and mobile screens.

Usman Baig

6-8 min

April 20, 2026

Our Proven Web Development Process That Delivers Real Results

In software development, success does not come from coding alone. Real results come from understanding business needs, planning the right workflow, building user-friendly designs...

Secure AWS Systems Manager connectivity illustration showing private cloud access to servers and databases without SSH exposure.

Umer Khan

6-8 min

April 20, 2026

Secure AWS Connectivity Using AWS Systems Manager (SSM)

In traditional cloud architectures, secure access to private resources such as databases and internal servers often relies on...

Cloud upload architecture illustration showing secure multi-account AWS infrastructure for enterprise environments.

Umer Khan

6-10 min

April 19, 2026

Building a Secure Multi-Account AWS Architecture for Enterprise Environments (Dev, STG, UAT, Prod)

In today’s cloud-first world, scalability and speed are no longer enough security, governance, and cost control are equally critical...

Friendly AI assistant robot beside a smartphone, representing adaptive AI agents for modern workflows.

Zohaib Anwar

6-8 min

April 15, 2026

Why You Should Use AI Agents Over Single Prompts: Unlocking the Power of Adaptive AI for Complex Workflows

In the world of artificial intelligence (AI), one of the biggest advancements has been the rise of AI agents that adapt dynamically to real-time data and complex workflows...

Data operations dashboard showing production quality checks, performance trends, and incident alerts across stores.

Yawar Khan

8-10 min

April 09, 2026

Production Ready ( Quality, performance, and the lessons learned shipping to 150 stores )

We chose dbt over custom scripts, built observability, optimized performance, and shipped to production...

Scalable data pipeline diagram highlighting dbt macros, reusable models, and multi-store analytics flow.

Yawar Khan

8-10 min

April 08, 2026

Scaling from 15 to 150 Stores ( When copy-paste becomes technical debt, macros become salvation )

We built a pipeline with observability, incremental models for performance, and snapshots for history. Our 15-store deployment ran smoothly...

Yawar Khan

8-10 min

April 07, 2026