AWS Data Engineering kinesisstreamingreal-time

Real-Time Data Streaming with Amazon Kinesis: Architecture Patterns

By Infra IT Consulting · January 29, 2024 · 10 min read

Content on this site is AI-assisted and personally reviewed by Hazem. Learn more

Batch pipelines process data that has already happened. Streaming pipelines process data as it happens. For fraud detection, real-time personalisation, operational dashboards, and IoT telemetry, the difference between a batch job that runs every hour and a streaming pipeline that processes events in seconds is the difference between useful and irrelevant.

Amazon Kinesis is AWS’s native streaming data platform. It is not a single service — it is a family of services, each addressing a distinct part of the streaming problem. Using the right Kinesis component for each part of your architecture is the first and most consequential decision in building a real-time pipeline on AWS.

The Kinesis Service Family

Amazon Kinesis Data Streams (KDS) is the core streaming backbone. Producers write records to a stream made up of shards. Each shard provides 1 MB/s of write throughput and 2 MB/s of read throughput, with data retained for 24 hours by default (up to 365 days with extended retention). Consumers read from shards using the Kinesis Client Library (KCL), AWS Lambda, or Kinesis Data Analytics.

Amazon Kinesis Data Firehose is a fully managed delivery service. Producers write to a Firehose delivery stream, and Firehose buffers the data and delivers it to a destination — S3, Redshift, OpenSearch, or Splunk — without requiring any consumer code. Firehose also supports inline transformation via AWS Lambda before delivery. It is not a real-time stream in the KDS sense; it has a minimum buffer time of 60 seconds.

Amazon Kinesis Data Analytics (now called Amazon Managed Service for Apache Flink) provides a managed Apache Flink environment for stateful stream processing — windowed aggregations, stream-to-stream joins, complex event detection, and ML inference over streaming data.

Understanding which service to use for which job is the foundation of every Kinesis architecture.

Pattern 1: Ingest-and-Land (Firehose to S3)

The simplest and most widely deployed Kinesis pattern delivers event data to a data lake on Amazon S3 with no custom consumer code.

Architecture:

Producers (application events, clickstreams, IoT sensors)
  → Kinesis Data Firehose
    → (optional) Lambda transformation
      → S3 raw zone (Parquet, partitioned by event date/hour)

Firehose handles buffering (up to 128 MB or 900 seconds, whichever comes first), format conversion (JSON to Parquet using the Glue Data Catalog schema), and compression (GZIP or Snappy). The Parquet conversion feature is particularly valuable — it eliminates the need for a separate ETL job to convert raw JSON to columnar format, and the output is immediately queryable via Amazon Athena.

Firehose’s dynamic partitioning feature (enabled via Firehose configuration) allows you to partition the S3 output based on values within the records themselves — for example, partitioning by $.customer_region or $.event_type extracted from each record. This produces Hive-style S3 prefixes (region=EU/event_type=purchase/year=2024/...) that Athena and Glue crawlers recognise automatically.

When to use this pattern: Ingestion pipelines where near-real-time latency (1-15 minutes) is acceptable and you do not need stateful stream processing. Log aggregation, clickstream ingestion, IoT telemetry — this pattern handles them all with minimal operational overhead.

Pattern 2: Real-Time Processing with Kinesis Data Streams and Lambda

When you need to act on events within seconds — not minutes — Kinesis Data Streams with Lambda consumers is the go-to pattern for low-latency, stateless record processing.

Architecture:

Producers
  → Kinesis Data Streams (N shards)
    → AWS Lambda (event source mapping, batch window 0)
      → DynamoDB / ElastiCache / SNS / SQS (real-time actions)
      → S3 (raw landing for batch processing)

Lambda’s Kinesis event source mapping processes batches of records from each shard. Key configuration parameters:

Batch size: 1–10,000 records per invocation. Larger batches are more efficient but add latency.
Batch window: 0–300 seconds. Setting to 0 means Lambda is invoked as soon as records are available.
Bisect on error: If enabled, a failing batch is split in half and retried, isolating poison-pill records.
Destination on failure: Failed batches after all retries can be sent to an SQS dead-letter queue for investigation.

The Lambda consumer pattern works best for stateless, per-record processing: enriching an event with a DynamoDB lookup, triggering a notification, writing a real-time metric to CloudWatch. For anything stateful — sliding window aggregations, session detection, out-of-order event handling — you need Managed Service for Apache Flink.

Shard count planning: Kinesis Data Streams shard capacity is a frequent pain point. Under-shard and you hit ProvisionedThroughputExceededException errors during traffic spikes. Over-shard and you pay for idle capacity (each shard costs ~$0.015/hour). Use Kinesis’s enhanced fan-out for multiple consumers, and monitor GetRecords.IteratorAgeMilliseconds — if this metric grows, your consumer is falling behind and you need more shards.

Pattern 3: Stateful Stream Processing with Amazon Managed Service for Apache Flink

For complex streaming analytics — fraud scoring, real-time sessionisation, windowed aggregations, stream-to-stream joins — the managed Apache Flink service provides a stateful, fault-tolerant stream processing engine without cluster management overhead.

Example use case: Detecting unusual transaction patterns in real time.

Kinesis Data Streams (transactions)
  → Managed Service for Apache Flink (Flink application)
    [Tumbling window: count and sum transactions per card_id per 5 minutes]
    [Join against reference stream: merchant category codes]
    [Detect: > 3 transactions in 5 min + total > $500 + international merchant]
  → Kinesis Data Streams (fraud alerts)
    → Lambda (trigger case creation in CRM)
    → Firehose → S3 (audit log for compliance)

Apache Flink’s stateful processing uses RocksDB-backed checkpoints stored in S3. If the Flink application fails or is updated, it restores state from the last checkpoint — ensuring exactly-once processing semantics and no data loss.

Flink’s SQL API is particularly accessible for data engineers familiar with SQL:

CREATE TABLE transactions (
  transaction_id STRING,
  card_id STRING,
  amount DECIMAL(10,2),
  merchant_country STRING,
  event_time TIMESTAMP(3),
  WATERMARK FOR event_time AS event_time - INTERVAL '5' SECOND
) WITH (
  'connector' = 'kinesis',
  'stream' = 'transactions-stream',
  'aws.region' = 'ca-central-1',
  'format' = 'json'
);

SELECT
  card_id,
  COUNT(*) AS txn_count,
  SUM(amount) AS total_amount,
  TUMBLE_START(event_time, INTERVAL '5' MINUTE) AS window_start
FROM transactions
WHERE merchant_country <> 'CA'
GROUP BY card_id, TUMBLE(event_time, INTERVAL '5' MINUTE)
HAVING COUNT(*) > 3 AND SUM(amount) > 500;

Pattern 4: Lambda Architecture (Batch + Streaming)

The Lambda Architecture (not to be confused with AWS Lambda) combines a streaming speed layer with a batch accuracy layer. For use cases where real-time approximations are valuable but batch-computed accuracy is eventually required — ad impression counting, recommendation model features, financial position tracking — this pattern balances latency and correctness.

Architecture:

Producers
  ├→ Kinesis Data Streams → Flink (speed layer: approximate real-time counts)
  │     → Redis / DynamoDB (real-time dashboard)
  └→ Kinesis Data Firehose → S3 raw zone (batch layer: exact counts)
        → AWS Glue ETL (hourly batch job)
          → S3 curated zone → Athena / Redshift (accurate reporting)

The speed layer serves low-latency queries with approximate or eventually-consistent results. The batch layer periodically overwrites with authoritative values. The serving layer merges both.

This pattern pairs naturally with real-time streaming ingestion feeding an S3 data lake, where Firehose provides the raw landing for batch reprocessing while the speed layer handles operational decisions.

Kinesis Limits and Gotchas

Shard-level throughput limits are the most common operational issue. Each Kinesis Data Streams shard supports 1,000 records/second or 1 MB/s for writes. If a producer sends a burst of events that exceeds shard capacity, Kinesis returns ProvisionedThroughputExceededException. Implement exponential backoff in your producer code and consider using the Kinesis Producer Library (KPL), which handles aggregation and retry automatically.

Hot shards occur when a partition key is skewed — for example, using customer_id as a partition key when one customer generates 80% of your event volume. Use composite partition keys or random suffix hashing to distribute load evenly across shards.

Firehose buffer delays mean Firehose is not a real-time service. The minimum buffer size is 1 MB or 60 seconds. If your downstream system needs data in under 60 seconds, use Kinesis Data Streams with a Lambda consumer instead of Firehose.

Enhanced fan-out costs. Standard shard consumers share 2 MB/s read throughput across all consumers on a shard. Enhanced fan-out dedicates 2 MB/s per consumer per shard but costs $0.015 per shard-hour for the consumer plus $0.013 per GB retrieved. For applications with multiple independent consumers on the same stream, enhanced fan-out eliminates consumer interference but adds cost.

Connecting Streaming to Your Broader Data Platform

Kinesis does not operate in isolation. In a complete data platform, streaming pipelines feed the same S3-based data lake as batch pipelines, registered in the same AWS Glue Data Catalog and governed by the same Lake Formation policies. The streaming layer adds recency; the batch layer adds accuracy and historical depth. Athena queries the combined dataset without caring whether the data arrived via Firehose or a nightly Glue ETL job.

Conclusion

Amazon Kinesis provides a complete toolkit for real-time data streaming on AWS: Firehose for zero-code delivery to S3 and Redshift, Kinesis Data Streams with Lambda for low-latency stateless processing, and Managed Service for Apache Flink for stateful, complex stream processing. The right architecture combines these services based on your latency requirements, processing complexity, and operational preferences — not one-size-fits-all. Ready to build or optimise your AWS data infrastructure? Contact the Infra IT Consulting team for a free consultation.

AWS Data Engineering

Talk to our team →

Real-Time Data Streaming with Amazon Kinesis: Architecture Patterns

The Kinesis Service Family

Pattern 1: Ingest-and-Land (Firehose to S3)

Pattern 2: Real-Time Processing with Kinesis Data Streams and Lambda

Pattern 3: Stateful Stream Processing with Amazon Managed Service for Apache Flink

Pattern 4: Lambda Architecture (Batch + Streaming)

Kinesis Limits and Gotchas

Connecting Streaming to Your Broader Data Platform

Conclusion

Related posts

Apache Iceberg with AWS Glue: The Modern Table Format Explained

7 Proven Ways to Cut AWS Data Pipeline Costs Without Losing Performance

Running Apache Airflow on AWS with MWAA

The Kinesis Service Family

Pattern 1: Ingest-and-Land (Firehose to S3)

Pattern 2: Real-Time Processing with Kinesis Data Streams and Lambda

Pattern 3: Stateful Stream Processing with Amazon Managed Service for Apache Flink

Pattern 4: Lambda Architecture (Batch + Streaming)

Kinesis Limits and Gotchas

Connecting Streaming to Your Broader Data Platform

Conclusion

Related posts

Apache Iceberg with AWS Glue: The Modern Table Format Explained

7 Proven Ways to Cut AWS Data Pipeline Costs Without Losing Performance

Running Apache Airflow on AWS with MWAA

We value your privacy