Infra IT Consulting logo Infra ITC
Data Architecture & Strategy event-drivenkafkakinesis

Event-Driven Data Architecture: Why It's the Future of Pipelines

By Infra IT Consulting Β· Β· 9 min read

The dominant paradigm in data engineering for the past two decades has been batch processing: collect data over a period, process it in bulk, move the results downstream. Batch pipelines are straightforward to reason about, relatively easy to debug, and well-supported by mature tooling. They are also fundamentally limited. A batch pipeline that runs every hour cannot power real-time fraud detection. A nightly pipeline cannot feed a live operations dashboard. A daily warehouse refresh cannot support personalisation that needs to respond to what a user just did.

Event-driven data architecture is not a replacement for batch processing β€” it is an architectural expansion that enables a class of use cases batch alone cannot serve, and it changes the fundamental model of how data moves through a system.

What Makes an Architecture β€œEvent-Driven”

In a traditional batch architecture, data moves because a scheduler triggers a job at a fixed time. The pipeline asks: β€œwhat changed since I last ran?” This polling model is simple but introduces inherent latency (the batch interval) and resource waste (polling even when nothing has changed).

In an event-driven architecture, data moves because something happened. An order was placed. A user logged in. A sensor recorded a reading. A payment was processed. Each of these occurrences produces an event β€” a structured record describing what happened, when, and the relevant attributes β€” and that event is immediately published to a messaging system. Downstream consumers receive and process events as they arrive.

The key properties of this model:

  • Low latency. Events are processed within seconds or milliseconds of occurring, not at the next batch window
  • Decoupling. The system that produces an event does not know who consumes it. Producers and consumers evolve independently
  • Scalability. Event streams scale horizontally β€” adding consumers does not increase load on producers
  • Resilience. Events are persisted in the messaging layer. If a consumer fails, it replays from where it stopped when it recovers

On AWS, three services implement the messaging layer for event-driven architectures:

  • Amazon Kinesis Data Streams β€” managed streaming service, lowest latency, tight AWS integration, up to 7-day retention, priced per shard
  • Amazon MSK (Managed Streaming for Apache Kafka) β€” fully managed Kafka, highest throughput and ecosystem breadth, longer retention, richer consumer group management
  • Amazon EventBridge β€” serverless event bus optimised for routing events between AWS services and SaaS applications; not suited for high-throughput data pipelines but excellent for orchestration events

Choosing Between Kinesis and MSK

The choice between Kinesis and MSK is one of the most common architectural decisions in event-driven data engineering on AWS.

Choose Kinesis when:

  • Your team has limited Kafka operational experience and prefers a fully managed service with minimal configuration
  • Your consumers are primarily AWS-native (Lambda, Kinesis Data Firehose, Glue, Flink on EMR)
  • Your throughput requirements fit within Kinesis shard limits (1 MB/s write and 2 MB/s read per shard)
  • You need tight integration with other AWS services without custom connector development

Choose MSK when:

  • You have existing Kafka expertise or Kafka-based applications
  • You need advanced consumer group management, Kafka Streams, or Kafka Connect with third-party connectors
  • Your retention requirements exceed Kinesis’s 7-day maximum (MSK with tiered storage supports indefinite retention in S3)
  • You need cross-region replication with MirrorMaker 2
  • Your throughput significantly exceeds what is economical with Kinesis sharding

For most greenfield AWS data engineering projects without existing Kafka expertise, Kinesis is the right default choice. For organisations running existing Kafka infrastructure or needing the broader ecosystem, MSK is the natural fit.

Designing an Event-Driven Data Pipeline

A well-designed event-driven data pipeline on AWS follows this pattern:

Event Sources (application, IoT, CDC)
    ↓
Amazon Kinesis Data Streams / MSK
    ↓
Event Processing Layer
    β”œβ”€β”€ Real-time: AWS Lambda (per-event transforms)
    β”œβ”€β”€ Micro-batch: Amazon Kinesis Data Analytics (Flink)
    └── Near-real-time batch: Kinesis Data Firehose β†’ S3
    ↓
Storage and Serving
    β”œβ”€β”€ Real-time: Amazon DynamoDB / ElastiCache / Timestream
    β”œβ”€β”€ Analytical: Amazon S3 (Parquet) β†’ Athena / Redshift
    └── Dashboards: Amazon QuickSight

A concrete example: an e-commerce platform where every user action β€” product view, add to cart, purchase, return β€” produces an event. These events are published to Kinesis with a partition key of user_id (ensuring all events for a user go to the same shard and arrive in order).

Lambda consumer for real-time personalisation: processes each event as it arrives, updates a user session record in DynamoDB, and triggers a recommendation refresh.

Kinesis Data Analytics (Flink) for fraud detection: runs a sliding window query across the stream, flagging users with more than 5 payment failures in the last 10 minutes.

Kinesis Data Firehose for analytics: buffers events and writes them as Parquet files to S3 every 5 minutes, partitioned by event type and date. This feeds the analytical lakehouse described in lakehouse architecture on AWS.

Event Schema Design and the Schema Registry

Events must have a well-defined, versioned schema. Without schema enforcement, a producer that changes an event structure silently breaks all downstream consumers β€” a problem that is painful to debug and expensive to fix.

AWS Glue Schema Registry provides schema enforcement for Kinesis and MSK. Producers register their event schema (in Avro, JSON Schema, or Protobuf), and the Schema Registry enforces that every event conforms to the registered schema before it enters the stream. Schema evolution is managed through compatibility modes:

  • BACKWARD compatible β€” new schema can read data produced with the previous schema (safest for consumers)
  • FORWARD compatible β€” previous schema can read data produced with the new schema (safest for producers)
  • FULL compatible β€” both backward and forward compatible

A schema registry for event-driven pipelines is the equivalent of data contracts for batch pipelines. The data contracts engineering post covers the conceptual framework; Schema Registry is the AWS implementation.

An example Kinesis producer with Schema Registry validation in Python:

import boto3
import json
from aws_schema_registry import SchemaRegistryClient, DataAndSchema
from aws_schema_registry.avro import AvroSchema

# Initialise clients
kinesis = boto3.client('kinesis', region_name='ca-central-1')
schema_registry = SchemaRegistryClient(
    boto3.client('glue', region_name='ca-central-1'),
    registry_name='ecommerce-events'
)

ORDER_PLACED_SCHEMA = AvroSchema({
    "type": "record",
    "name": "OrderPlaced",
    "namespace": "com.yourcompany.ecommerce",
    "fields": [
        {"name": "order_id", "type": "string"},
        {"name": "customer_id", "type": "string"},
        {"name": "total_cad", "type": "double"},
        {"name": "items", "type": {"type": "array", "items": "string"}},
        {"name": "placed_at", "type": {"type": "long", "logicalType": "timestamp-millis"}}
    ]
})

def publish_order_placed(order: dict):
    # Schema Registry validates and serialises the event
    data_and_schema = DataAndSchema(data=order, schema=ORDER_PLACED_SCHEMA)
    serialised = schema_registry.serialise(
        data_and_schema,
        schema_name='OrderPlaced'
    )

    kinesis.put_record(
        StreamName='ecommerce-events',
        Data=serialised,
        PartitionKey=order['customer_id']  # Route by customer for ordering
    )

Handling Late-Arriving Events and Out-of-Order Data

One of the practical challenges of event-driven architectures is that events do not always arrive in the order they occurred. Mobile app events may be buffered and sent in bulk when connectivity resumes. IoT sensors may experience network delays. A payment processor webhook may be retried minutes after the original attempt.

Designing for late arrivals:

  • Event time vs. processing time. Always record and process on event_time (when the event occurred) rather than processing_time (when it arrived). Store both in your event schema.
  • Watermarking in Flink. Amazon Kinesis Data Analytics (Flink) supports watermarks β€” a mechanism for declaring β€œwe have seen all events up to time T minus some latency allowance.” Events arriving after the watermark are late arrivals, which can be routed to a side output for handling rather than dropped.
  • Idempotent consumers. Consumers should handle duplicate events gracefully. At-least-once delivery (the Kinesis guarantee) means the same event may be processed twice if a consumer checkpoint fails. Design upserts rather than inserts in downstream stores.

Event-Driven Architecture and the Modern Data Stack

Event-driven architecture does not replace the batch-oriented modern data stack; it extends it. Most organisations need both. The batch layer (dbt, Redshift, Athena, daily Glue jobs) handles the majority of analytical workloads where same-day or next-day freshness is acceptable. The streaming layer (Kinesis, Flink, Lambda) handles the minority of workloads that genuinely require near-real-time processing.

A mature data architecture has clear criteria for which workloads belong in which layer. The post on the modern data stack covers the batch layer in detail; the streaming layer described in this post complements it for latency-sensitive use cases.

The event-driven model also aligns naturally with the event-driven data architecture pattern, where domain events are the canonical communication mechanism between services β€” not just for analytics, but for operational integration. When every business event is published to a durable stream, the analytical data lake is a subscriber just like any operational system, consuming events to build analytical state in parallel with operational processing.

Conclusion

Event-driven data architecture on AWS β€” built on Kinesis or MSK, with Lambda for per-event processing, Flink for streaming aggregations, and Firehose for analytical delivery β€” enables a class of use cases that batch pipelines cannot serve: real-time fraud detection, live operational dashboards, personalisation at millisecond latency, and CDC-driven data lake updates.

The key to success is disciplined schema management via AWS Glue Schema Registry, event-time processing with proper watermarking, and idempotent consumers that handle redelivery gracefully. These are engineering disciplines, not afterthoughts, and they require expertise to implement correctly in production.

If your organisation is evaluating a move to event-driven architecture, or needs to build streaming data pipelines on AWS that are reliable enough for production workloads, contact Infra IT Consulting for an architecture review and implementation partnership.

Related posts