OneLake Architecture: The Single Data Lake Powering Microsoft Fabric
Content on this site is AI-assisted and personally reviewed by Hazem. Learn more
A European retailer running analytics on Azure had a familiar problem: their raw transaction data lived in an Azure Data Lake Storage Gen2 account, their cleansed data in a second ADLS account managed by the data engineering team, and their Power BI datasets lived in yet another import storage layer inside the Power BI service. When the data engineering team updated a schema, the downstream Power BI refresh broke. When a governance audit required data lineage, nobody could trace a Power BI measure back to its source ADLS path without manual documentation.
This is the problem that OneLake — the storage foundation of Microsoft Fabric — was architected to solve.
What OneLake Is
OneLake is a single logical data lake that spans an entire Microsoft Fabric tenant. Every Fabric workspace, every Lakehouse, every Data Warehouse, every KQL Database writes to and reads from the same underlying OneLake storage. There is no separate storage account to provision. There are no connection strings to configure. Storage is automatic, tenant-wide, and globally addressable through a consistent namespace.
Architecturally, OneLake is built on top of Azure Data Lake Storage Gen2. The underlying storage technology is not new — what is new is the abstraction layer Microsoft has placed above it. Where ADLS Gen2 requires explicit account provisioning, access key management, and service-to-service connections, OneLake provides a single namespace (onelake.dfs.fabric.microsoft.com) with Microsoft Entra ID authentication throughout.
The native file format in OneLake is Delta Parquet — the open-source Delta Lake table format (originated by Databricks, now governed by the Linux Foundation) with Parquet as the column-store encoding. Delta provides ACID transaction guarantees, time travel (table history and rollback), schema evolution, and efficient metadata operations on large datasets.
The Namespace Structure
OneLake organises data hierarchically:
<tenant>.onelake.dfs.fabric.microsoft.com/
<workspace>/
<item>/
Tables/ ← Managed Delta tables
Files/ ← Raw files, unmanaged data
Workspaces are the primary organisational unit in Fabric — analogous to a project or team boundary. A workspace contains a set of Fabric items (Lakehouses, Warehouses, Notebooks, Pipelines, Reports). Workspaces map naturally to development environments: you create separate dev, test, and production workspaces, each with their own OneLake namespace.
Items are Fabric resources within a workspace. A Lakehouse is the most common item — it exposes two zones within OneLake: the Tables/ zone for managed Delta tables (queryable by SQL and Spark) and the Files/ zone for raw, unstructured, or semi-structured data in any format.
Every Delta table in the Tables/ zone is immediately readable by all Fabric compute engines — Spark notebooks, the SQL analytics endpoint, Power BI in DirectLake mode — with no configuration. This is the core value proposition: write once, read everywhere.
Shortcuts: Linking External Data Without Copying
One of OneLake’s most practically significant features is shortcuts. A shortcut is a logical pointer — a symbolic link in the OneLake namespace — that references data stored outside OneLake: in another OneLake workspace, in an Azure Data Lake Storage Gen2 account, in an Amazon S3 bucket, or in a Google Cloud Storage bucket.
From Fabric’s perspective, a shortcut looks and behaves like a native OneLake folder. A Spark notebook can read from Files/raw-transactions/ whether that path is a native OneLake path or a shortcut pointing to an S3 bucket in eu-west-1. The compute engine handles the connectivity transparently.
This is meaningful for organisations with data in multiple clouds or with existing ADLS investments they cannot immediately migrate. An organisation using AWS S3 as their primary data lake can create shortcuts in OneLake, run Fabric Spark notebooks that read from S3, write outputs back to native OneLake Delta tables, and serve Power BI reports from those tables — all without copying the source data into Azure storage.
Shortcut caveats to understand:
Cross-cloud shortcuts create data transfer costs. When a Fabric Spark job reads from an S3 shortcut, data traverses from S3 to Azure. At scale, these egress costs (S3 charges for data leaving AWS) can become significant. Shortcuts work best for moderate-volume reference datasets or for organisations with multi-cloud data that cannot be consolidated. For high-volume, high-frequency processing, copying data into native OneLake is more economical.
OneLake vs AWS S3 with Lake Formation
For teams evaluating Fabric against an AWS-native data lake architecture, the comparison is instructive:
| Capability | OneLake (Microsoft Fabric) | AWS S3 + Lake Formation |
|---|---|---|
| Storage provisioning | Automatic, tenant-wide | Manual S3 bucket creation |
| Native table format | Delta Parquet (managed) | Iceberg, Hudi, or Delta (configured) |
| Unified namespace | Yes — all workloads share one namespace | No — each service addresses S3 separately |
| Fine-grained access control | Workspace roles + item permissions + RLS | Lake Formation column/row-level policies |
| Cross-account data linking | Shortcuts (S3, GCS, ADLS) | S3 Access Points, cross-account bucket policies |
| BI integration | Power BI DirectLake (no import) | QuickSight SPICE (import required for speed) |
| Governance catalog | Microsoft Purview (native) | AWS Glue Data Catalog |
AWS Lake Formation offers more granular and battle-tested governance controls — particularly for column-level security on Iceberg tables. OneLake’s simplicity advantage is real: for organisations without a dedicated platform engineering team, the zero-configuration storage model meaningfully reduces time-to-value.
Medallion Architecture in OneLake
The medallion architecture (Bronze → Silver → Gold) maps cleanly onto OneLake. The recommended pattern uses a single Lakehouse with three Delta table namespaces, or three separate Lakehouses in the same workspace — one per medallion layer:
Bronze Lakehouse: Raw ingested data, written by Eventstream or Data Factory Copy activity into the Files/ zone. No schema enforcement. Append-only. Preserved for reprocessing.
Silver Lakehouse: Cleansed, deduplicated, typed data in the Tables/ zone. Delta MERGE operations handle deduplication. Schema evolution managed via Delta’s schema evolution features.
Gold Lakehouse: Aggregated, business-logic-applied tables — the layer that Power BI semantic models read from in DirectLake mode.
The three-Lakehouse approach provides cleaner access control separation: data engineers have write access to Bronze and Silver; BI developers have read access to Gold; end users see only the Power BI semantic model layer. See our post on Data Engineering in Microsoft Fabric for the pipeline implementation detail.
Governance in OneLake
Sensitivity labels from Microsoft Purview propagate end-to-end through OneLake. A sensitivity label applied to a Delta table in OneLake is inherited by Power BI reports that read from that table — meaning a “Highly Confidential” label on a compensation dataset will restrict who can view the downstream Power BI report without any additional configuration.
Microsoft Purview scans OneLake automatically, cataloguing Delta tables, inferring schemas, and mapping lineage from source data through transformations to Power BI reports. For organisations with existing Purview investments, Fabric extends that governance model rather than creating a parallel one.
Row-level security is applied at the Power BI semantic model layer — not at the OneLake storage layer. This means RLS policies must be defined in the semantic model (DAX filters) and are enforced when users access reports. For data-layer enforcement, workspace roles and item-level permissions control which users can query OneLake directly.
When Shortcuts Shine — and When They Create Costs
Shortcuts are the right tool when:
- You have existing ADLS Gen2 data you cannot migrate immediately
- You need to reference a small-to-medium volume reference dataset from another cloud
- You are running a proof-of-concept Fabric deployment against existing data before full commitment
Shortcuts create hidden costs when:
- Source data is in S3 or GCS and is accessed frequently at high volume (cross-cloud egress)
- The shortcutted data changes frequently, causing repeated full scans
- Governance requirements mandate data residency within Azure (shortcuts to non-Azure sources may not satisfy residency requirements)
For regulated industries — Canadian financial services under OSFI, or European organisations under GDPR — validate that your shortcuts configuration satisfies data residency requirements before relying on cross-cloud shortcuts in production.
Conclusion
OneLake is the architectural decision that makes Microsoft Fabric a coherent platform rather than a collection of rebranded Azure services. The unified namespace, Delta Parquet as the default format, and the shortcuts capability for non-disruptive external data integration represent a genuine advance over the previous generation of Azure analytics architecture.
Evaluating Microsoft Fabric for your organisation? Infra IT Consulting helps Canadian and international businesses assess, architect, and implement modern data platforms. Book a discovery call →
Related posts
Data Governance in Microsoft Fabric: Purview Integration, Sensitivity Labels, and Access Control
Read more Microsoft FabricMicrosoft Fabric Explained: What It Is, What It Replaces, and Who Actually Needs It
Read more Microsoft FabricData Engineering in Microsoft Fabric: Spark Notebooks, Pipelines, and Lakehouse Patterns
Read moreBook a free 30-minute consultation to discuss your data engineering and analytics needs.
Talk to our team →