linkedin insight
Omax Tech

Loading...

Streamlining ETL (Part 2) – Building AWS Glue Connections and ETL Pipelines for Snowflake

Streamlining ETL (Part 2) – Building AWS Glue Connections and ETL Pipelines for Snowflake

Cloud/DevOps
May 16, 2025
7-8 min

Share blog

Introduction

In Part 1, we focused on securely setting up the AWS Glue and Snowflake integration, including IAM roles, and Secrets Manager. Now it’s time to move towards the core functionality—building the actual ETL pipeline using AWS Glue Studio and connecting it to Snowflake!

This blog will cover:

  • Creating a secure AWS Glue connection to Snowflake.
  • Building an end-to-end ETL job using Glue Studio.
  • Applying data transformations (mapping, type conversions).

Create AWS Glue Connection to Snowflake

  • Go to AWS Glue > Data Catalog > Connections > Create Connection.
  • Select Snowflake as your data source.
  • Click Next
  • Now For Host and Port in connection, go to your snowflake and run command
  • Use role AccountAdmin;
  • SELECT SYSTEM$ALLOWLIST();
React Query features

The code first link in the dialog is the Host and Port 443 is also given.

  • Add Host and Port here.
  • Select the IAM Role created in Part 1.
  • Select the secret created in Part 1.
  • Click Test Connection, if success than click next, verify your snowflake credentials.
  • Provide a name to connection.
  • Review and Create Connection.

Build a Visual Glue Job

Why Visual ETL in AWS Glue Studio?

AWS Glue Studio simplifies ETL development using a drag-and-drop interface while still allowing custom code injections via auto-generated PySpark scripts. It’s ideal for:

  • Data Engineers who prefer no-code/low-code workflows.
  • Teams seeking faster pipeline prototyping.
  • Anyone who wants the flexibility of switching between visual + code-based ETL.

Steps

  • 1
    Go to AWS Glue > ETL jobs > Visual ETL.
  • 2
    Select S3 as a Source, Apply Mapping node for data transformation, and than select target snowflake.
  • 3
    Now we will start configuring all three nodes.

Data Source – S3 Bucket

  • Provide the s3 URL for the file we want from s3 to be pushed to snowflake.
  • Provide format as csv.
  • In left, select the IAM role created in part 1.
  • After selecting the IAM role it will automatically start reading the file and will start to display some rows of data.
React Query features

Transform – Change Schema

This step displays all the columns from the source file along with their inferred data types, and also allows you to modify these data types before loading the data into the target database.

Like initially all three types were string.

React Query features

I updated the schema for id and price, changed the data type to int.

React Query features

Snowflake Database Setup

  • Now go to your snowflake and create a table in a database with the definition of the data.
React Query features

Data target- Snowflake node

  • Select the snowflake connection made in the previous step.
  • Enter the name of your database, schema, and table in snowflake.
React Query features

Review & Edit Auto-Generated Python (PySpark) Script

Click on “Script” tab to view your auto-generated Glue script.

React Query features

Customizing the Script (Optional)

For more advanced use cases, you can:

  • Add data cleansing logic (e.g., removing nulls).
  • Insert conditional logic (e.g., only load data where price > 0).
  • Integrate custom logging using AWS CloudWatch or external monitoring tools.

Saving and Running

  • Give a name to a job and save the job.
  • Now run the job.
  • Go to AWS Glue > ETL Jobs > Job run monitoring
  • View your job running.
React Query features

Verification

  • Once your job completes successfully, switch to Snowflake and query your table. You should see the ingested records from S3!
React Query features

Conclusion

In this two-part series, we walked through the complete journey of securely integrating AWS Glue with Snowflake to build a scalable and production-ready ETL pipeline.

In Part 1, we focused on laying the groundwork. We discussed why each prerequisite is critical, including the role of IAM for enforcing least privilege access, and the significance of AWS Secrets Manager for secure credential management. These foundational steps ensure that your pipeline is not only functional but also aligned with security and governance best practices.

In Part 2, we brought everything together to create the actual Glue ETL job. Leveraging the IAM role and securely stored Snowflake credentials from Part 1, we established a Glue connection to Snowflake. From there, we designed a visual ETL job, configured our S3 source and Snowflake target, applied schema mappings, and customized Glue-generated PySpark scripts for flexibility.

By combining strong upfront setup (IAM, Secrets) with proper Glue job design, you’ve now built a secure, automated, and production-ready pipeline capable of transferring and transforming data seamlessly between AWS and Snowflake.

Blogs

Discover the latest insights and trends in technology with the Omax Tech Blog.

View All Blogs
Data operations dashboard showing production quality checks, performance trends, and incident alerts across stores.
8-10 min
April 09, 2026

Production Ready ( Quality, performance, and the lessons learned shipping to 150 stores )

We chose dbt over custom scripts, built observability, optimized performance, and shipped to production...

Read More
Scalable data pipeline diagram highlighting dbt macros, reusable models, and multi-store analytics flow.
8-10 min
April 08, 2026

Scaling from 15 to 150 Stores ( When copy-paste becomes technical debt, macros become salvation )

We built a pipeline with observability, incremental models for performance, and snapshots for history. Our 15-store deployment ran smoothly...

Read More
Observability dashboard tracking source freshness, pipeline status, and real-time data quality alerts.
8-10 min
April 07, 2026

Keeping Your Data Fresh: ( The wake-up call at 3am that taught us about observability )

That morning taught us a crucial lesson: a successful dbt run doesn't mean your data is fresh, accurate, or complete. You need observability.

Read More
Retail data architecture visual showing fragmented store databases consolidated into a unified analytics pipeline.
8-10 min
April 06, 2026

Retail Data Chaos: How We Found Our Way Out ( When spreadsheetsfail and databases multiply, where do you turn? )

Picture this: You're managing data for a growing retail chain. Store afterstore opens New York, San Francisco, Los Angeles—each with its own MySQL database...

Read More
Secure AI access workflow showing authentication, authorization, and protected enterprise operations.
8-10 min
April 07, 2026

Securing Your AI-Powered Future (How Authorization Ensures Safe and Appropriate Access)

Discover how authorization in MCP ensures secure, role-based access for AI-powered business workflows...

Read More
AI security dashboard visualizing request throttling, traffic control, and system protection metrics.
6-8 min
April 06, 2026

Protecting Your AI-Powered Systems (How Rate Limiting Ensures Stability and Performance)

MCP connects AI to your applications (Episode 1) and enables powerful self-service analytics (Episode 2)...

Read More
AI dashboard visual showing analytics insights, charts, and automated business reporting.
6-8 min
April 05, 2026

AI-Powered Analytics (How MCP Enables Self-Service Reporting Without Developers)

One of the most powerful applications of MCP is enabling self-service analytics. Product owners, managers, and business analysts...

Read More
Futuristic AI robot on a digital platform representing artificial intelligence and automation.
6-8 min
April 04, 2026

AI Meets Your Applications (What is MCP and Why Your Business Needs It Now)

Traditional application programming interfaces (APIs) have served us well, but they require technical knowledge. Developers need to understand endpoints...

Read More
Startup MVP architecture illustration with rocket and analytics icons.
6-8 min
Feb 25, 2026

Why Building the Right MVP Architecture No Longer Slows You Down

Just build a simple monolith for your MVP. You can fix the architecture later...

Read More

Get In Touch

Build Your Next Big Idea with Us

From MVPs to full-scale applications, we help you bring your vision to life on time and within budget. Our expert team delivers scalable, high-quality software tailored to your business goals.