Data Architecture & Strategy multi-cloudawsstrategy

Multi-Cloud Data Strategy: When It Makes Sense and When It Doesn't

By Infra IT Consulting · February 27, 2024 · 10 min read

Content on this site is AI-assisted and personally reviewed by Hazem. Learn more

Multi-cloud is one of the most debated topics in enterprise IT strategy — and also one of the most frequently misapplied. The pitch from analysts and vendors is straightforward: run workloads across AWS, Azure, and Google Cloud, avoid vendor lock-in, and negotiate better pricing through competition. In practice, the engineering reality is considerably more nuanced.

For data-intensive organisations, multi-cloud adds specific complications that generalised infrastructure teams often underestimate. Storage is not portable between clouds without egress costs. Query engines that work brilliantly in one cloud environment need re-tuning in another. Data sovereignty rules in Canada (PIPEDA, Quebec’s Law 25) and the UK (UK GDPR) create compliance surface area that multiplies with every cloud boundary you cross. This post gives you an honest framework for deciding whether a multi-cloud data strategy serves your organisation or just adds complexity.

The Genuine Case for Multi-Cloud

Multi-cloud is genuinely justified in a narrow set of circumstances. Recognising these cases prevents the architecture from becoming a solution looking for a problem.

Best-of-breed tooling that exists on only one cloud. Google BigQuery’s native ML capabilities (BigQuery ML) and Google’s Vertex AI have no equivalent in AWS at the same price point for certain workloads. If your data science team genuinely needs those capabilities and your operational data lives on AWS, a data-sharing pattern where curated datasets are replicated to Google Cloud for ML training — while transactional and operational systems remain on AWS — is architecturally honest. AWS Data Exchange and Google Cloud’s managed data transfers make this more tractable than it was three years ago.

Regulatory geographic requirements that conflict with a single provider’s region availability. If you must keep certain data in a specific jurisdiction and your primary cloud provider does not have a region in that jurisdiction, a second cloud provider may be necessary. This is increasingly rare as AWS, Azure, and GCP have expanded region coverage, but it is a legitimate driver.

Merger and acquisition scenarios. When two companies merge and each runs a mature data platform on a different cloud, forcing immediate consolidation is often more disruptive than running a managed multi-cloud environment for 18–24 months while a migration is planned properly.

Negotiating leverage for large contracts. Very large enterprises (100M+ annual cloud spend) can achieve meaningful discounts by demonstrating credible multi-cloud capability. For organisations below that scale, the operational cost of maintaining multi-cloud typically exceeds any negotiated savings.

The Real Costs Multi-Cloud Proponents Understate

Egress costs destroy the economics of data movement. AWS charges up to $0.09 per GB for outbound data transfer to the internet. Moving 10 TB of data from AWS S3 to Azure Blob Storage costs approximately $920 in egress fees alone — before you factor in the ingress costs, the transformation work, and the latency. For organisations considering replicating a data lake across clouds “for resilience,” the annual cost of keeping 500 TB synchronised is north of $45,000 in transfer fees alone.

Tooling fragmentation compounds over time. AWS Glue jobs, AWS Step Functions orchestration, and Lake Formation access control do not have direct equivalents on Azure Data Factory or Google Cloud Dataflow. Teams that go multi-cloud often end up maintaining parallel toolchains for ETL, orchestration, data cataloguing, and access management. The cognitive overhead and the onboarding cost for new engineers doubles.

Security and compliance surface area multiplies. Each cloud provider has its own IAM model, its own encryption key management system, its own audit logging format, and its own compliance certification timeline. A compliance audit across two clouds is not twice the work — it is considerably more, because auditors need to verify that data governance controls are consistent across environments that express policies in fundamentally different ways.

Operational expertise is harder to acquire and retain. Deep expertise in AWS data services (Glue, Redshift, Kinesis, Lake Formation, DataZone) takes years to develop. Finding engineers who have genuine depth across two cloud providers’ data ecosystems is substantially harder and more expensive. This is particularly relevant for Canadian organisations competing in a tight talent market.

Primary Cloud with Selective Secondary Use: The Pragmatic Middle Ground

Most organisations that claim to be “multi-cloud” are actually running a primary cloud for 80–90% of their workloads and using a secondary cloud for a specific, bounded capability. This is architecturally sound because it limits the blast radius of the multi-cloud decision.

A pattern that works well for AWS-primary organisations:

AWS (primary — 90% of workload)
├── S3 (data lake)
├── Glue + Redshift (transformation and warehousing)
├── Kinesis (streaming)
├── SageMaker (primary ML training and inference)
└── Lake Formation (governance)

GCP (secondary — bounded use case)
└── Vertex AI (specific ML capability or Google-native data)
    ← Receives curated feature datasets from AWS via BigQuery Omni or direct transfer
    → Returns model artefacts to S3 for inference on SageMaker

The key discipline is keeping the secondary cloud’s scope explicitly bounded in your architecture decision record (ADR). If that boundary starts expanding — teams wanting to run analytics on GCP because it is convenient, not because it is necessary — the multi-cloud footprint will grow until you are genuinely operating two full data platforms.

Data Sovereignty and Canadian Compliance Considerations

For Canadian organisations, the data sovereignty question is particularly important. PIPEDA requires that organisations have appropriate safeguards for personal information transferred to third parties, and Quebec’s Law 25 (effective fully since September 2023) introduces explicit data residency and impact assessment requirements.

Running data across two cloud providers does not automatically create a compliance problem, but it does require that your data governance documentation explicitly maps which data categories live where, under what controls, and under which cloud provider’s data processing agreements. AWS’s Data Processing Addendum and GCP’s equivalent have different coverage terms, different sub-processor lists, and different audit mechanisms. Your DPO (or external compliance counsel) needs to review both.

AWS Canada (Central) in Montreal and AWS Canada West in Calgary give you options for keeping Canadian personal data onshore with AWS. If your primary driver for multi-cloud is data residency, the better question is whether your cloud provider now has sufficient regional coverage to make multi-cloud unnecessary for that purpose.

When to Stay Single-Cloud

The default recommendation for organisations under $10M annual cloud spend, or with data engineering teams smaller than ten engineers, is to stay single-cloud. The benefits of deep integration between AWS services — the native IAM integration between Glue, Redshift, S3, and Lake Formation, the unified CloudWatch observability stack, the seamless VPC networking — are worth more than theoretical vendor independence.

Vendor lock-in at the data platform layer is often overstated as a risk. The real lock-in is not in which cloud stores your Parquet files — those are portable. The lock-in is in your team’s expertise, your operational runbooks, your CI/CD pipelines, and your monitoring configurations. Switching clouds is a major engineering project regardless of which open-source formats you use.

The better approach to managing vendor dependency is investing in open, portable data formats — Apache Parquet, Apache Iceberg, Delta Lake — so that your data assets can be migrated if the decision to switch clouds is ever made. Apache Iceberg with AWS Glue is particularly worth considering because Iceberg tables are readable by engines on any cloud without proprietary format conversion.

This pairs directly with a well-designed Modern Data Stack, where the format and storage layer are decoupled from the compute layer — giving you portability at the data layer without paying the operational cost of running two cloud environments simultaneously.

A Decision Framework

Before committing to a multi-cloud data strategy, work through these questions with your architecture and business leadership:

What specific capability on Cloud B cannot be achieved on Cloud A at acceptable cost? If you cannot name it concretely, multi-cloud is not justified.
What is the annual egress cost of keeping datasets synchronised? Quantify it. If it exceeds $50,000/year, it needs explicit business justification.
How many additional FTE headcount does multi-cloud tooling require? At $150,000 average fully-loaded cost for a data engineer in Canada, two additional engineers to support multi-cloud tooling costs $300,000/year.
Do your compliance obligations require geographic distribution that your primary cloud cannot satisfy? If yes, document the specific regulation and the specific data category.
Does your engineering team have depth in both clouds, or are you proposing to build it? Building multi-cloud expertise from scratch while delivering data products is high-risk.

Conclusion

Multi-cloud data strategy is the right answer for a specific set of organisations — those with genuine best-of-breed requirements on multiple platforms, M&A-driven cloud diversity, or specific regulatory constraints that one provider cannot address. For the majority of data organisations, a single-cloud strategy with portable open formats is more cost-effective, simpler to operate, and easier to staff.

If you are evaluating whether a multi-cloud approach is justified for your organisation’s data platform, or if you have inherited a multi-cloud environment and need help rationalising it, reach out to the Infra IT Consulting team. We help Canadian, UK, and African organisations make pragmatic architecture decisions that match their actual constraints and scale over time.

Data Architecture & Strategy

Talk to our team →

Multi-Cloud Data Strategy: When It Makes Sense and When It Doesn't

The Genuine Case for Multi-Cloud

The Real Costs Multi-Cloud Proponents Understate

Primary Cloud with Selective Secondary Use: The Pragmatic Middle Ground

Data Sovereignty and Canadian Compliance Considerations

When to Stay Single-Cloud

A Decision Framework

Conclusion

Related posts

Cloud-Native Analytics Strategy: A Roadmap for 2024 and Beyond

Lambda vs. Kappa Architecture: Which Fits Your Streaming Use Case?

Vector Databases on AWS: Enabling AI-Powered Search and RAG

The Genuine Case for Multi-Cloud

The Real Costs Multi-Cloud Proponents Understate

Primary Cloud with Selective Secondary Use: The Pragmatic Middle Ground

Data Sovereignty and Canadian Compliance Considerations

When to Stay Single-Cloud

A Decision Framework

Conclusion

Related posts

Cloud-Native Analytics Strategy: A Roadmap for 2024 and Beyond

Lambda vs. Kappa Architecture: Which Fits Your Streaming Use Case?

Vector Databases on AWS: Enabling AI-Powered Search and RAG

We value your privacy