Lambda vs. Kappa Architecture: Which Fits Your Streaming Use Case?
When organisations move beyond batch ETL into real-time data processing, two architectural patterns dominate the conversation: Lambda and Kappa. Both solve the problem of combining historical and live data, but they do so with fundamentally different trade-offs in complexity, cost, and operational overhead. Choosing between them is not a matter of which is “better” — it is a matter of which fits your latency requirements, team capabilities, and data volume today and eighteen months from now.
This post breaks down both architectures in concrete terms, maps them to AWS services, and gives you a practical decision framework.
What Lambda Architecture Actually Means
Lambda architecture, popularised by Nathan Marz around 2012, splits data processing into three layers:
- Batch layer: Processes the complete historical dataset on a schedule (typically hours or days), producing accurate but latent views.
- Speed layer: Processes incoming events in near-real-time, producing approximate but current views.
- Serving layer: Merges outputs from both layers to answer queries.
On AWS, a canonical Lambda implementation looks like this:
Raw events → Amazon Kinesis Data Streams
↓ ↓
Amazon Managed Flink Amazon S3 (raw zone)
(speed layer) ↓
↓ AWS Glue batch job (batch layer)
Amazon DynamoDB ↓
(hot results) Amazon Redshift (batch results)
↓
Query merges both (serving layer via Athena or application logic)
The batch layer reprocesses everything from S3 with AWS Glue or EMR, giving you correction capability when you fix bugs. The speed layer running on Amazon Managed Service for Apache Flink gives you sub-second latency for recent data. The serving layer — often Amazon Redshift or a custom API — merges results.
This design has a significant upside: the batch layer is always the source of truth. If your streaming code has a bug that corrupts speed layer results, you simply fix the code and wait for the next batch run to overwrite the damage. That guarantee is operationally valuable in regulated industries where data accuracy is non-negotiable.
The downside is equally significant: you are maintaining two separate codebases for the same business logic. The Glue job that calculates daily revenue and the Flink application that calculates streaming revenue must agree exactly, or your serving layer will produce inconsistent results. Keeping them in sync across every schema change and business rule update is real engineering work.
What Kappa Architecture Fixes — and What It Introduces
Kappa architecture, proposed by Jay Kreps (Kafka’s co-creator) in 2014, eliminates the batch layer entirely. There is only one processing layer: a streaming pipeline that reads from a replayable log. When you need to reprocess historical data, you spin up a new version of the streaming job, point it at the beginning of the log, let it catch up, and then cut over.
On AWS, the foundation is Amazon Kinesis Data Streams or Amazon MSK (Managed Streaming for Apache Kafka) configured with an extended retention window. Kinesis supports up to 365 days of retention; MSK can retain data for as long as your storage allows. Your streaming job — typically an Amazon Managed Flink application — reads from offset zero during reprocessing and from the current position in production.
# Example: Flink job reading from Kinesis with configurable start position
env = StreamExecutionEnvironment.get_execution_environment()
kinesis_consumer = FlinkKinesisConsumer(
"orders-stream",
SimpleStringSchema(),
{
"aws.region": "ca-central-1",
"flink.stream.initpos": "TRIM_HORIZON", # replay from start
# Use "LATEST" for production reads
}
)
stream = env.add_source(kinesis_consumer)
stream \
.map(parse_order) \
.key_by(lambda o: o["customer_id"]) \
.window(TumblingEventTimeWindows.of(Time.minutes(5))) \
.aggregate(RevenueAggregator()) \
.add_sink(kinesis_sink)
The appeal is clear: one codebase, one deployment pipeline, one set of business logic to maintain. When your schema changes, you update one job and replay.
The challenge Kappa introduces is state management during replay. If your streaming job maintains stateful aggregations — session windows, running totals, complex joins — replaying from the beginning means rebuilding all that state from scratch. A Flink job processing 12 months of Kinesis data might take hours to catch up, during which you need to run it in parallel with the production job. Managing that handover without dropping events or double-counting requires careful checkpoint design and infrastructure headroom.
Comparing the Two on AWS: A Practical Matrix
| Dimension | Lambda | Kappa |
|---|---|---|
| Code duplication | High — two codebases | None — single streaming job |
| Reprocessing correctness | Excellent — full batch rerun | Good — depends on retention window |
| Latency | Near-real-time + batch | Near-real-time only |
| Historical analysis | Native via batch layer | Requires replay or separate S3 sink |
| Operational complexity | High — two pipelines | Medium — one pipeline, complex replay |
| AWS cost pattern | Glue + Flink + Redshift | Flink + extended Kinesis/MSK retention |
One pattern that blurs the line is using Amazon S3 as both a raw event store and a replay source. You write every event to S3 (via Kinesis Firehose) and to Kinesis. Your Flink job processes the stream for real-time views. When you need to reprocess history, you read from S3 using Flink’s S3 source connector rather than Kinesis replay. This hybrid approach avoids Kinesis retention costs while preserving Kappa’s single-codebase advantage — though it adds the complexity of keeping S3 and Kinesis writes synchronised.
When Lambda Architecture Is Still the Right Call
Lambda architecture is not obsolete. It remains the better choice in three scenarios:
High-accuracy requirements with long correction windows. Financial services organisations that need to reconcile streaming revenue figures against end-of-day batch settlements benefit from having a batch layer that can recompute everything cleanly. The cost of running two codebases is lower than the cost of an error in a regulated report.
Queries that span years of history efficiently. Kappa’s replay is fine for reprocessing, but if analysts frequently query historical aggregates that predate your Kinesis retention window, the batch layer on Redshift or Athena serves those queries orders of magnitude faster than replaying events.
Legacy environments with established batch infrastructure. If your organisation already runs mature Glue or Spark pipelines and is adding a real-time capability on top, Lambda is often the pragmatic path. You are extending existing infrastructure rather than rebuilding it.
For teams designing event-driven systems from the ground up, the patterns described in Event-Driven Data Architecture provide useful complementary context on how events flow between services before they reach your processing layer.
When Kappa Architecture Wins
Kappa is the right call when your team has strong streaming engineering capability and your use cases are genuinely latency-sensitive:
IoT and telemetry platforms where data older than a few hours loses its operational value — Kappa’s simplicity and single deployment pipeline reduce the blast radius of changes.
Real-time personalisation where recommendations need to reflect the last five minutes of user behaviour — the batch layer’s latency is a product liability, not a technical detail.
Organisations with a small data engineering team — maintaining one Flink application is substantially less operational overhead than keeping a Glue batch layer and a Flink speed layer aligned.
The architecture also pairs well with Delta Lake on AWS, where Delta’s ACID transaction support and time-travel capabilities give you some of the reprocessing safety net that Lambda’s batch layer traditionally provided — without duplicating your processing code.
A Decision Framework for Canadian and International Organisations
Before choosing an architecture, answer these five questions:
- What is your acceptable query latency? If business users need dashboards updated within 30 seconds, Kappa. If hourly is fine, Lambda simplifies your stack.
- How long do you need to query raw history? If ad-hoc historical queries spanning multiple years are common, Lambda’s batch layer on Redshift or Athena will outperform Kappa replay.
- How large is your data engineering team? Teams smaller than five engineers should lean toward Kappa to avoid the maintenance cost of two parallel codebases.
- How often does your business logic change? High churn in calculation logic makes Lambda painful — every change requires two deployments. Kappa’s single codebase is safer.
- What are your regulatory correction requirements? If you must be able to recompute any historical period with corrected logic and prove the output, Lambda’s batch layer provides that guarantee more naturally.
Conclusion
Neither Lambda nor Kappa architecture is universally superior. Lambda’s strength — a reliable, accurate batch layer — is also its weakness: two codebases, two deployments, two sets of bugs. Kappa’s strength — simplicity and a single codebase — depends on a replayable log with sufficient retention and a team capable of managing stateful streaming at scale.
For most Canadian data teams building on AWS today, Kappa with Amazon MSK or Kinesis Data Streams and Amazon Managed Flink is the starting point worth evaluating first. Reserve Lambda for environments where regulatory accuracy requirements or historical query patterns genuinely justify the added complexity.
If you are working through this decision for your organisation and want to pressure-test your architecture before you build, reach out to the Infra IT Consulting team. We help data teams in Canada, the UK, and Africa design streaming pipelines that match their actual requirements — not the architecture that was trending when the decision was last revisited.
Related posts
The Data Platform Maturity Model: Where Does Your Organisation Stand?
Read more Data Architecture & StrategyAPI-First Data Architecture: Exposing Data as Services
Read more Data Architecture & StrategyData Strategy for Startups: Building for Scale from Day One
Read moreBook a free 30-minute consultation to discuss your data engineering and analytics needs.
Talk to our team →