SageMaker Unified Studio: Your All-in-One AWS Analytics Platform

To start with, I must say SageMaker Unified Studio (later we’ll use SUS as abbreviation) is confusing if you come from the traditional individual AWS analytical services—because it wraps all the services you’ve already worked with, like below:

S3: Storage
Lake Formation: Data Governance with fine-grained permissions
Glue: For Spark workloads and data catalog management
Redshift: Data warehouse
Athena: Ad-hoc SQL queries
SageMaker Notebook: Running Python scripts or connecting to Glue Interactive Sessions
Bedrock: For Generative and Agentic AI components
Amazon Q: For AI-assisted code generation (SQL and Python)
DataZone: For business catalog, project management, and cross-domain data sharing
EMR: For big data processing with Spark

The different components and services adds operational and governance overhead for data, compute and security, as well as they exist in silo.

This brings the need for Lakehouse architecture.

Now with Unified Studio, you have all storage, compute (Athena, EMR, Redshift, and Glue), and governance (DataZone and Lake Formation) wrapped under one managed umbrella.

But here’s the interesting part—SUS provides a unified platform to implement lakehouse architecture seamlessly. And before you ask “why lakehouse?”—let me explain the problem this solves.

The Problem: Why We Need Lakehouse Architecture

You might be wondering—why are we even talking about lakehouse architecture? Because it solves a massive pain point you’ve probably experienced.

The Traditional Mess We’ve All Dealt With

Scenario 1: The Data Lake Approach

You dump all your data into S3 (cheap storage ✅)
But now you need to run analytics…
Performance is terrible ❌
No ACID transactions ❌
Data quality enforcement? Good luck! ❌
Result: You end up copying data to Redshift for actual analytics

Scenario 2: The Data Warehouse Approach

You load everything into Redshift (great performance ✅)
But storage costs skyrocket 💸
Can’t handle semi-structured data well ❌
ML teams want raw data in S3 anyway ❌
Result: You end up maintaining both S3 and Redshift with duplicate data

The Real Problem:

Data duplication everywhere—paying for the same data multiple times
Complex ETL pipelines just to move data between systems
Multiple permission models to manage across different services
Inconsistent data across systems causing trust issues
Slow time-to-insight because of all this overhead
High operational costs for maintaining duplicate infrastructure

Sound familiar? This is exactly why SageMaker Unified Studio provides a unified platform to implement lakehouse architecture.

The Solution: What is Lakehouse Architecture?

Lakehouse architecture is an approach that combines the best of both worlds:

Data Lake + Data Warehouse = Lakehouse

According to AWS documentation:

“A data lakehouse is an architecture that unifies the scalability and cost-effectiveness of data lakes with the performance and reliability characteristics of data warehouses. This approach eliminates the traditional trade-offs between storing diverse data types and maintaining query performance for analytical workloads.”

Key Benefits of Lakehouse Architecture:

✅ Transactional consistency – ACID compliance ensures reliable concurrent operations
✅ Schema management – Flexible schema evolution without breaking existing queries
✅ Compute-storage separation – Independent scaling of processing and storage resources
✅ Open standards – Compatibility with Apache Iceberg open standard
✅ Single source of truth – Eliminates data silos and redundant storage costs
✅ Real-time and batch processing – Supports both streaming and historical analytics
✅ Direct file access – Enables both SQL queries and programmatic data access
✅ Unified governance – Consistent security and compliance across all data types

This architectural approach is what SageMaker Unified Studio helps you implement without the complexity. Let’s see how.

How SageMaker Unified Studio Implements Lakehouse Architecture

SageMaker Unified Studio provides a unified platform that implements lakehouse architecture for you automatically. According to AWS documentation:

“The lakehouse architecture of Amazon SageMaker unifies data across Amazon S3 data lakes and Amazon Redshift data warehouses so you can work with your data in one place.”

Here’s how SUS implements this architecture:

1. Unified Data Access Through Single Catalog

Instead of managing separate connections to S3, Redshift, Aurora, DynamoDB, and other sources—you get one unified interface:

AWS Glue Data Catalog serves as the single catalog where you discover and query all your data
Apache Iceberg open table format provides interoperability across different analytics engines
Multiple query engines (Athena, Redshift, Spark on EMR) can all access the same data without duplication

How it works:

“When you run a query, AWS Lake Formation checks your permissions while the query engine processes data directly from its original storage location, whether that’s Amazon S3 or Amazon Redshift.”

This means data stays where it is—no unnecessary movement or duplication.

2. Two Types of Data Access

Managed Data Sources:

Amazon S3 data lakes – Including Amazon S3 Tables with built-in Apache Iceberg support
Amazon Redshift warehouse tables – Accessible as Iceberg tables through Redshift Spectrum
Zero-ETL destinations – Near real-time data replication from:
- SaaS sources (Salesforce, SAP, Zendesk)
- Operational databases (Amazon Aurora, Amazon RDS for MySQL)
- NoSQL databases (Amazon DynamoDB)

Federated Data Sources (Query in-place without moving data):

Operational databases (PostgreSQL, MySQL, Microsoft SQL Server)
AWS managed databases (Amazon Aurora, Amazon RDS, Amazon DynamoDB, Amazon DocumentDB)
Third-party data warehouses (Snowflake, Google BigQuery)

When you connect federated sources in SUS, AWS automatically provisions the required infrastructure components (AWS Glue connection, Lambda functions) that act as bridges between the query engines and the federated data source.

3. Centralized Governance with Lake Formation

One permission model (AWS Lake Formation) that enforces access control consistently across:

S3 data lakes
Redshift data warehouses
Federated sources
All query engines (Athena, Redshift Query Editor v2, EMR, Glue)

Fine-grained control at table, column, row, and cell levels—defined once, enforced everywhere.

4. Project-Based Organization with DataZone

Amazon DataZone powers the project and domain management in SUS:

Business catalog for data discovery and data product publishing
Project boundaries for collaboration and permissions
Cross-domain data sharing for governed data access across teams
Domain management for organizing multiple projects

The Architecture Components:

Let me clarify what each component actually is:

Lakehouse Architecture = The architectural pattern/approach (not a product)
SageMaker Unified Studio = The unified platform that implements lakehouse architecture
Athena, Redshift, Spark (EMR) = The query/compute engines that process your queries
Glue Data Catalog = The unified metadata catalog (single source of truth for metadata)
Lake Formation = The governance layer providing fine-grained permissions
DataZone = The business catalog and project management layer
Apache Iceberg = The open table format enabling cross-engine interoperability

Now that you understand how SUS implements lakehouse architecture, let’s see how it organizes your workflow.

The Three Core Sections of SUS

With lakehouse architecture providing unified data access, SUS organizes your workflow into three intuitive sections:

A. Discover

Your starting point for data exploration:

Data Catalog (powered by AWS Glue Data Catalog) – Discover all available data across lakes, warehouses, and federated sources
Business Catalog (powered by DataZone) – Find published data products and datasets shared across domains
Bedrock Playground – Experiment with Generative AI models and prompts

This is where you explore what data is available across all your sources—all unified through the single catalog.

B. Build

This is where the action happens. SUS provides access to multiple analytical and ML tools:

Query Editors:
- Amazon Athena Query Editor – For serverless SQL queries across S3 and federated sources
- Amazon Redshift Query Editor v2 – For high-performance queries on warehouse data
Notebooks and Development:
- JupyterLab Notebooks – For data science, ML development, and programmatic data access
- SageMaker Training – For building and training ML models
- SageMaker Inference – For deploying models
Data Processing:
- AWS Glue Visual ETL – For no-code/low-code data transformations
- Amazon EMR – For big data processing with Apache Spark
Orchestration:
- Amazon MWAA (Airflow) – For workflow orchestration and scheduling
AI Assistance:
- Amazon Q Developer – For AI-assisted SQL and Python code generation

All these tools access your unified data seamlessly through different query engines, without requiring data movement.

C. Govern

Making your curated and valuable data consumable for other consumers:

Publish data products to the business catalog
Share datasets across projects and domains
Enforce permissions consistently through Lake Formation
Track data lineage and usage
Manage data quality and compliance

Lake Formation ensures consistent permissions across all access patterns and query engines, while DataZone manages the business metadata and sharing workflows.

Understanding the Architecture

The different green boxes represent the concepts of SageMaker Unified Studio and how they interconnect:

From	To	Relationship	Cardinality	Meaning
Domain	Domain units	contains	1:M	Organizational structure
Domain	Projects	consist of	1:M	Projects belong to domain
Projects	Users	include members	M:M	Users work in projects
Projects	Data	encapsulate	1:M	Projects access data sources
Users	Assets	govern data via	M:M	Users create/manage assets
Data	Assets	is published into	M:M	Data curated into assets

The green boxes together represent the DataZone-powered organizational framework that provides:

✅ Structure (Domain, Domain units)
✅ Collaboration (Projects, Users)
✅ Data management (Data, Assets)
✅ Governance (permissions flow through all six)

This is the foundation that enables lakehouse architecture implementation in SageMaker Unified Studio!

Notice how SUS provides the platform layer that implements lakehouse architecture, with the unified catalog at the center and multiple query engines accessing data from its original storage location.

🚀 Try It Yourself

Want hands-on experience? AWS has a practical workshop covering everything in this post:

👉 SageMaker Unified Studio Workshop

This workshop will simulate a real-world scenario through the lens of different data professionals addressing actual business challenges. You’ll experience the end-to-end process, from initial data analysis to deploying a GenAI-powered tailored student engagement

Read the Workshop and flow the screenshots to understand SUS and the workshop implemnetation.

💬 Feedback Welcome

What did you think?

✅ Helpful sections?
🤔 Confusing parts?
💡 Topics for next post?

Have you tried SageMaker Unified Studio? Share your experience in the comments!