AWS Data Engineering data-meshawsarchitecture

Implementing a Data Mesh Architecture on AWS

By Infra IT Consulting · June 17, 2024 · 10 min read

Content on this site is AI-assisted and personally reviewed by Hazem. Learn more

The data mesh concept, introduced by Zhamak Dehghani in 2019, has moved from theoretical framework to practical implementation challenge for many mid-to-large organisations. The core idea — treating data as a product owned by the domain teams that understand it best, federated across a self-serve infrastructure — addresses real pathologies in centralised data lake architectures: bottlenecks at the central data team, domain knowledge loss in translation, and data that is technically available but practically unusable.

AWS provides the building blocks for a data mesh implementation, but the architecture requires deliberate design across account structure, access control, metadata management, and governance. This guide covers the patterns that work in production.

The Four Pillars Applied to AWS

Dehghani’s four principles map to specific AWS capabilities:

Domain-oriented ownership maps to AWS Organizations and multi-account structure. Each domain (Commerce, Logistics, Finance, Customer) owns at least one AWS account where its data products live.

Data as a product maps to well-defined S3 prefixes with stable schemas managed through the AWS Glue Data Catalog, with data contracts enforced through Glue Data Quality rules and published via AWS Data Exchange or internal catalogues.

Self-serve data infrastructure maps to AWS Lake Formation’s cross-account sharing capabilities, Infrastructure as Code templates (covered in Terraform for AWS Data Stacks), and shared Terraform modules that domain teams use to provision compliant data product infrastructure.

Federated computational governance maps to AWS Lake Formation Tag-Based Access Control (LF-TBAC), AWS Config rules, and Service Control Policies (SCPs) at the Organizations level.

Multi-Account Architecture

The foundational decision in an AWS data mesh is account structure. A common pattern for a mid-sized organisation:

AWS Organization
├── Management Account (billing, SCPs only)
├── Infrastructure OU
│   ├── Shared Services Account (Transit Gateway, DNS, Terraform state)
│   └── Security Account (CloudTrail, GuardDuty, Security Hub)
├── Data Platform OU
│   ├── Data Governance Account (Lake Formation admin, central catalog)
│   └── Data Platform Account (shared Athena, QuickSight, MWAA)
└── Domain OU
    ├── Commerce Account (owns commerce data products)
    ├── Logistics Account (owns logistics data products)
    ├── Finance Account (owns finance data products)
    └── Customer Account (owns customer data products)

Each domain account has its own S3 buckets, Glue Data Catalog databases, and IAM roles. The Data Governance account is the Lake Formation administrator for the entire organisation and coordinates cross-account data sharing.

Critical SCP to enforce: Prevent domain accounts from disabling Lake Formation on their Glue catalogs. Without this, domain teams can bypass governance by reverting to S3 bucket policy access control:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyLakeFormationAdminRemoval",
      "Effect": "Deny",
      "Action": [
        "lakeformation:PutDataLakeSettings"
      ],
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:PrincipalAccount": "${data_governance_account_id}"
        }
      }
    }
  ]
}

Lake Formation’s RAM (Resource Access Manager) integration is the mechanism for cross-account data product sharing. A domain account grants access to a specific Glue database or table to a principal in another account via Lake Formation, and RAM propagates the share.

In the producing domain account (Commerce):

import boto3

lf = boto3.client('lakeformation')

# Grant SELECT on the orders table to the Data Platform account's Athena role
lf.grant_permissions(
    Principal={
        'DataLakePrincipalIdentifier': 'arn:aws:iam::111122223333:role/AthenaAnalystRole'
    },
    Resource={
        'Table': {
            'CatalogId': '444455556666',  # Commerce account ID
            'DatabaseName': 'commerce_products',
            'Name': 'orders'
        }
    },
    Permissions=['SELECT', 'DESCRIBE'],
    PermissionsWithGrantOption=[]
)

In the consuming account, the shared database appears in the local Glue Data Catalog as a resource link. Athena queries the resource link transparently — consumers do not need to know which account owns the underlying S3 data.

Defining Data Contracts

A data mesh without data contracts is just a distributed data swamp. Data contracts define the schema, SLA, and data quality commitments that a domain team makes to its consumers. On AWS, implement data contracts through a combination of:

Schema contracts enforced by AWS Glue Schema Registry. Register your Avro, JSON Schema, or Protobuf schema in the registry and configure producers to validate against it before writing:

from aws_schema_registry import SchemaRegistryClient
from aws_schema_registry.avro import AvroSchema

client = SchemaRegistryClient(boto3.client('glue'), registry_name='commerce-registry')

schema = AvroSchema(json.dumps({
    "type": "record",
    "name": "Order",
    "namespace": "com.myorg.commerce",
    "fields": [
        {"name": "order_id", "type": "string"},
        {"name": "customer_id", "type": "string"},
        {"name": "order_amount", "type": "double"},
        {"name": "currency", "type": {"type": "enum", "name": "Currency", "symbols": ["CAD", "USD", "GBP", "NGN", "ZAR"]}},
        {"name": "created_at", "type": "long", "logicalType": "timestamp-millis"}
    ]
}))

# Schema evolution: BACKWARD_ALL allows adding optional fields without breaking consumers
client.register('Order', schema, compatibility='BACKWARD_ALL')

Quality contracts enforced by Glue Data Quality rules that run as part of the domain’s ETL pipeline. Quality results are published to the central Data Governance account’s S3 bucket, where they feed a cross-domain quality dashboard.

SLA contracts enforced through CloudWatch alarms on data freshness. A Lambda function runs on a schedule, checks the _metadata/last_updated.json file in each data product’s S3 prefix, and alarms if the timestamp exceeds the SLA window.

The Self-Serve Data Portal

One of the hardest operational challenges in a data mesh is discoverability. When data products are distributed across ten domain accounts, how does a Finance analyst find the customer data product they need?

AWS offers two paths:

Amazon DataZone (launched 2023) is a fully managed data cataloguing and governance service designed specifically for data mesh architectures. It provides a business-friendly data portal, workflow-based access requests, and native integration with Lake Formation for access control. DataZone is the recommended approach for organisations starting fresh.

DIY with AWS Glue + Glue Data Catalog Resource Policies: For organisations already invested in Glue, use a central Glue Data Catalog in the Data Governance account as the authoritative registry of all data products. Each domain account pushes metadata to the central catalog via an API Gateway → Lambda → Glue Catalog pipeline when a new data product is published.

Regardless of the approach, the self-serve portal must answer these questions for every data product:

What data does this product contain?
Who owns it and how do I request access?
What is the schema and how often does it change?
What is the data quality SLA?
How fresh is the data right now?

Federated Governance with LF-Tags

Lake Formation Tag-Based Access Control (LF-TBAC) is the governance mechanism that allows central policy at scale without per-table permission management. Assign LF-tags to data resources and grant permissions to IAM principals based on tags rather than individual resources:

# Assign tags to the orders table
lf.add_lf_tags_to_resource(
    Resource={
        'Table': {
            'DatabaseName': 'commerce_products',
            'Name': 'orders'
        }
    },
    LFTags=[
        {'TagKey': 'domain', 'TagValues': ['commerce']},
        {'TagKey': 'sensitivity', 'TagValues': ['confidential']},
        {'TagKey': 'region', 'TagValues': ['ca-central-1']},
    ]
)

# Grant access to all non-confidential commerce data to the analytics team
lf.grant_permissions(
    Principal={'DataLakePrincipalIdentifier': 'arn:aws:iam::111122223333:role/AnalyticsTeam'},
    Resource={
        'LFTagPolicy': {
            'ResourceType': 'TABLE',
            'Expression': [
                {'TagKey': 'domain', 'TagValues': ['commerce']},
                {'TagKey': 'sensitivity', 'TagValues': ['public', 'internal']}
            ]
        }
    },
    Permissions=['SELECT', 'DESCRIBE']
)

When a domain team adds a new table and tags it appropriately, the analytics team automatically gains access without any manual permission grant. This is the “federated” part of federated governance — the central team sets policy through tags, domain teams apply tags to their resources, and Lake Formation enforces access automatically.

Observability Across Domains

A central concern in data mesh implementations is end-to-end observability. When a report breaks, which domain’s data product is at fault? Build cross-domain observability by:

Centralised CloudWatch cross-account dashboards: Use CloudWatch’s cross-account observability feature (available in AWS Organizations) to aggregate Glue job metrics and Lambda metrics from all domain accounts into a single operations dashboard in the Data Governance account.
Data product health metrics: Each domain publishes a standardised set of CloudWatch custom metrics — records_processed, freshness_minutes, quality_score — under the same metric namespace with a domain dimension.
Lineage tracking: Use Amazon DataZone or a custom solution (Apache Atlas hosted on EC2) to track data lineage across domains. When an order amount column changes in the Commerce account, downstream Finance reports that join on it need to be identified automatically.

For a deeper dive into the analytics layer that sits on top of a data mesh, see our guide to Building Self-Service Analytics Platforms on AWS.

Common Implementation Pitfalls

Starting too distributed: Most organisations are not ready for a full data mesh on day one. Start with two or three domains, validate the infrastructure patterns, and expand. A premature mesh with ten domains and no mature contracts is worse than a well-run centralised lake.

Neglecting the data product developer experience: If standing up a new data product requires two weeks of tickets to the platform team for IAM roles, VPC endpoints, and catalog registrations, domain teams will route around the system. The self-serve platform must make publishing a new data product a single-day activity.

Confusing domain boundaries with organisational boundaries: Data mesh domains should align with business capabilities, not org chart lines. The Commerce domain owns order data even if the Order Management team, the Payments team, and the Fraud team all touch it. One domain team is the accountable publisher; others are consumers.

Our overview of Lakehouse Architecture on AWS provides complementary context for teams deciding whether a mesh or a centralised lakehouse is the right starting point.

Conclusion

Implementing a data mesh on AWS is an organisational transformation as much as a technical one. The AWS services — Lake Formation, Glue Schema Registry, S3, DataZone, Organizations SCPs — are mature and capable. The harder challenges are establishing data product ownership culture, defining and enforcing data contracts, and building a self-serve platform that makes domain teams genuinely autonomous.

Infra IT Consulting has guided organisations across Canada, the UK, and Africa through data mesh architecture design and implementation on AWS. Contact us to discuss whether a data mesh is the right architectural direction for your organisation and what a phased implementation roadmap looks like.

AWS Data Engineering

Talk to our team →

Implementing a Data Mesh Architecture on AWS

The Four Pillars Applied to AWS

Multi-Account Architecture

Defining Data Contracts

The Self-Serve Data Portal

Federated Governance with LF-Tags

Observability Across Domains

Common Implementation Pitfalls

Conclusion

Related posts

Automating Data Quality Checks with Great Expectations on AWS

Decoupling Data Pipelines with AWS SNS and SQS

AWS Glue Streaming ETL: Processing Kafka and Kinesis Data

The Four Pillars Applied to AWS

Multi-Account Architecture

Cross-Account Data Sharing with Lake Formation

Defining Data Contracts

The Self-Serve Data Portal

Federated Governance with LF-Tags

Observability Across Domains

Common Implementation Pitfalls

Conclusion

Related posts

Automating Data Quality Checks with Great Expectations on AWS

Decoupling Data Pipelines with AWS SNS and SQS

AWS Glue Streaming ETL: Processing Kafka and Kinesis Data

We value your privacy