Infra IT Consulting logo Infra ITC
AWS Data Engineering emrserverlesshadoop

EMR Serverless vs. EMR on EC2: A Cost and Performance Comparison

By Infra IT Consulting Β· Β· 9 min read

Amazon EMR has been the go-to choice for large-scale Apache Spark workloads on AWS for nearly a decade. In 2022, AWS introduced EMR Serverless β€” a deployment model that eliminates cluster management entirely and charges only for the compute used during job execution. For teams already running EMR on EC2, the question is whether to migrate. For teams starting fresh, it is which model to choose.

The honest answer depends on your job patterns, team expertise, and cost profile. This comparison covers the real trade-offs β€” not a simplistic β€œserverless is always better” narrative β€” based on the workload characteristics that actually determine which model wins.

What Changes Between the Two Models

EMR on EC2 provisions a cluster of EC2 instances under your control. You choose the instance types, cluster size, and software configuration. The cluster can be transient (spun up for one job, terminated afterward) or long-running (persistent, serving many jobs across multiple days). EMR manages the Hadoop/Spark layer, but you manage the cluster lifecycle β€” start, stop, resize, patch.

EMR Serverless abstracts the cluster entirely. You create an EMR Serverless application, submit jobs to it, and AWS provisions compute resources automatically when a job starts, scales them during execution, and releases them when the job completes. There are no instances to choose, no cluster to manage, and no idle costs when no jobs are running.

Both run the same Apache Spark engine (same versions, same configuration parameters, same PySpark/Scala APIs). Code written for one runs on the other with no modifications. The difference is entirely in the deployment and cost model.

Startup Latency: The First Trade-Off

EMR Serverless has a meaningful cold start. When a job is submitted to an EMR Serverless application with no pre-initialised capacity, it takes 60-120 seconds for the first workers to become available. For jobs with total execution times measured in minutes, this startup overhead is a significant percentage of total wall-clock time.

Pre-initialised capacity addresses this: you can configure an EMR Serverless application to maintain a warm pool of workers at a fixed cost, reducing startup latency to under 10 seconds. Pre-initialised capacity is charged continuously whether or not jobs are running β€” which partially erodes the cost advantage of serverless for low-frequency workloads.

EMR on EC2 with a long-running cluster has no per-job startup latency β€” the cluster is already up and jobs begin immediately. For interactive Spark shells or Jupyter notebooks running on EMR, the persistent cluster is the only practical option.

For transient EMR on EC2 clusters (spun up per job), cluster startup takes 3-7 minutes. EMR Serverless with cold start wins here β€” 60-120 seconds beats 3-7 minutes for true per-job provisioning.

Cost Model: Where Each Wins

EMR Serverless charges per vCPU-second and per GB-RAM-second consumed during job execution. As of 2024, pricing in ca-central-1:

  • $0.052624 per vCPU-hour
  • $0.0057785 per GB-RAM-hour

A Spark job using 10 executors Γ— 4 vCPUs Γ— 16 GB RAM running for 30 minutes costs:

vCPU: 40 vCPUs Γ— 0.5 hours Γ— $0.052624 = $1.05
RAM: 160 GB Γ— 0.5 hours Γ— $0.0057785 = $0.46
Driver (1 vCPU, 4 GB): negligible
Total: ~$1.51

EMR on EC2 with equivalent compute (m5.xlarge = 4 vCPUs, 16 GB RAM, ~$0.192/hour On-Demand):

10 instances Γ— 0.5 hours Γ— $0.192 = $0.96
+ EMR premium: ~$0.048/hour per instance = $0.24
Total: ~$1.20

On-Demand EMR on EC2 is slightly cheaper for this single job. But with Spot Instances (60-70% discount over On-Demand for interruptible workloads), EMR on EC2 drops to ~$0.40 for the same job β€” less than one-third the cost of EMR Serverless.

Spot Instances on EMR on EC2 are the cheapest option for cost-tolerant, restartable batch workloads. EMR Serverless wins on cost when:

  1. Jobs are infrequent enough that a long-running cluster would be idle most of the time
  2. Spot availability in your region or AZ is unreliable for your instance type
  3. You want cost predictability without building Spot interruption handling

For teams with sustained, high-throughput Spark workloads running most of the day, Reserved Instance pricing on EMR on EC2 is almost always cheaper than EMR Serverless. A 1-year Reserved m5.4xlarge runs around $0.16/hour β€” well below Serverless pricing for equivalent vCPU/RAM.

Performance Tuning Depth

Apache Spark performance is sensitive to dozens of configuration parameters: executor memory overhead, shuffle partition count, broadcast join threshold, off-heap memory settings, adaptive query execution thresholds, and more. EMR on EC2 gives you complete control over all of these β€” in spark-defaults.conf, at the cluster level, or per-job via --conf flags.

EMR Serverless supports per-job Spark configuration overrides via the sparkSubmit parameters, covering the most common tuning knobs. However, some low-level settings β€” particularly around YARN resource manager tuning, OS-level memory configuration, and network topology settings β€” are not accessible in Serverless.

For the majority of Spark workloads, the accessible configuration surface in EMR Serverless is sufficient. For teams actively tuning large-scale joins, skewed data handling, and memory-intensive ML workloads, the deeper configuration access in EMR on EC2 makes a measurable difference.

A concrete example: shuffle performance in large joins. Spark’s adaptive query execution (AQE) dynamically coalesces small shuffle partitions and converts sort-merge joins to broadcast joins when the smaller relation fits in memory. AQE is available in both deployment models. But if AQE’s broadcast threshold is still insufficient β€” for example, when a β€œsmall” dimension table is 8 GB and you want to force a broadcast β€” you can set spark.sql.autoBroadcastJoinThreshold=8589934592 in EMR on EC2. In EMR Serverless, per-application default Spark configurations are set at application creation and apply to all jobs; per-job overrides can override these, but application-level tuning requires recreating the application.

Custom Software and Library Management

EMR on EC2 supports bootstrap actions β€” shell scripts that run on every cluster node before Spark starts. Bootstrap actions install custom Python packages, native libraries (GEOS for geospatial work, for example), or modified Hadoop configurations that are not part of the standard EMR distribution.

EMR Serverless supports custom images β€” Docker containers you build and publish to Amazon ECR, which EMR Serverless uses as the execution environment. This provides equivalent flexibility to bootstrap actions but requires building and maintaining Docker images. The custom image approach is more reproducible (the image is versioned and immutable) but adds a container build step to your deployment pipeline.

For teams with complex library dependencies β€” geospatial processing, proprietary ML frameworks, custom Spark connectors β€” the custom image approach in EMR Serverless is workable but requires additional tooling investment compared to bootstrap scripts.

Operational Simplicity: The Serverless Advantage

The EMR Serverless advantage is most clear in operational overhead. EMR on EC2 clusters require:

  • Instance type selection and sizing
  • Cluster auto-scaling policy configuration
  • Spot interruption handling (if using Spot)
  • EMR version patching and upgrades
  • Security group and VPC configuration
  • Monitoring and alerting for cluster health

EMR Serverless eliminates all of this. You define an application (specifying the EMR release version and, optionally, custom image or pre-initialised capacity), submit jobs, and AWS handles everything else. No nodes to patch, no cluster health to monitor, no Spot interruption logic to write.

For teams without a dedicated infrastructure engineering function β€” common in smaller Canadian and African data teams β€” this operational simplicity is a genuine advantage. The alternative is either a fragile set of shell scripts around EMR cluster management or adopting an orchestration tool like Apache Airflow (via Amazon MWAA) to manage cluster lifecycle.

This connects to the broader infrastructure-as-code discussion in our guide on Terraform for AWS Data Stacks β€” even EMR on EC2 clusters managed via Terraform require meaningful ongoing operational investment compared to a Serverless application.

A Decision Framework

FactorEMR ServerlessEMR on EC2
Job frequencyLow / irregularHigh / continuous
Startup latency tolerance>2 min acceptable<2 min required
Spot usageNot neededWilling to handle interruptions
Sustained workloadsNot optimalReserve Instances for cost
Library dependenciesStandard + DockerComplex / native
Spark tuning depthStandard needsDeep performance work
Team infra expertiseLimitedStrong
Compliance (data residency)Same VPC supportFull VPC control

For teams building or evaluating their Spark deployment model alongside broader ETL tool decisions, see our comparison of AWS Glue and Apache Spark ETL options β€” Glue is worth considering before EMR for many workloads.

Start with EMR Serverless if: You are building a new data platform, jobs run once or a few times per day, your library requirements are standard, and your team prefers managed services.

Start with EMR on EC2 if: You already have significant Spark tuning expertise, workloads run continuously throughout the day, Spot Instances are important for cost targets, or you need specific bootstrap configurations that custom images do not cleanly support.

Run both if: Your platform has a mix of scheduled batch jobs (EMR Serverless) and interactive exploration or long-running streaming workloads (EMR on EC2 persistent cluster).

Conclusion

EMR Serverless and EMR on EC2 are complementary tools, not a binary choice. Serverless wins on operational simplicity and is cost-competitive for irregular workloads β€” but Spot-enabled EMR on EC2 is substantially cheaper for sustained high-throughput processing. Deep Spark tuning, complex library requirements, and very low latency requirements favour EC2. Most teams will find EMR Serverless sufficient for batch ETL and benefit from its managed simplicity, while reserving EMR on EC2 for the workloads where full cluster control genuinely pays off. Ready to build or optimise your AWS data infrastructure? Contact the Infra IT Consulting team for a free consultation.

Related posts