Cloud Migration & Cost Optimization teradataredshiftmigration

Teradata to Amazon Redshift Migration: What No One Tells You

By Infra IT Consulting · March 3, 2024 · 10 min read

Content on this site is AI-assisted and personally reviewed by Hazem. Learn more

Teradata migrations are in a class of their own. The platform is deeply embedded in enterprise data warehousing, its SQL dialect is idiosyncratic, and its customers tend to have 10–20 years of accumulated DDL, stored procedures, macros, and business logic that is extraordinarily difficult to untangle. When those customers decide to migrate to Amazon Redshift, they frequently discover that the migration is twice as long and three times as complex as their initial estimate.

This post covers the technical realities that migration guides typically gloss over — SQL compatibility gaps, data distribution strategy, performance tuning differences, and the organisational factors that determine whether a Teradata-to-Redshift migration succeeds or stalls.

The SQL Compatibility Gap Is Larger Than It Looks

Teradata SQL and Amazon Redshift SQL are both ANSI SQL derivatives, but they diverge in numerous ways that matter at scale. AWS’s Schema Conversion Tool (SCT) handles the most common transformations automatically, but it cannot resolve all of them, and the ones it cannot resolve are typically the most business-critical.

Teradata-specific syntax that requires manual rewriting:

SEL shorthand for SELECT — trivial, but pervasive in legacy code
QUALIFY clause for row filtering on window functions — Redshift requires a subquery
COMPRESS column attribute — Teradata’s value compression has no direct Redshift equivalent; use Redshift column encoding instead
MULTISET and SET table modifiers — Teradata distinguishes between tables that allow duplicates and those that do not at the DDL level; Redshift has no equivalent concept
VOLATILE TABLE — Teradata’s session-scoped temporary tables map to Redshift CREATE TEMP TABLE, but Teradata volatile tables support indexes; Redshift temp tables do not
PERIOD data type for temporal data — requires conversion to two timestamp columns in Redshift
Teradata macros — these are parameterised SQL templates with no Redshift equivalent; must be rewritten as stored procedures or application-layer logic

Here is a representative example of a QUALIFY rewrite:

-- Teradata: use QUALIFY to filter on window function result
SELECT
    customer_id,
    order_date,
    order_amount,
    RANK() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn
FROM orders
QUALIFY rn = 1;

-- Redshift equivalent: wrap in subquery
SELECT customer_id, order_date, order_amount
FROM (
    SELECT
        customer_id,
        order_date,
        order_amount,
        RANK() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn
    FROM orders
) ranked
WHERE rn = 1;

This rewrite is mechanical, but multiplied across thousands of queries and hundreds of stored procedures, it is months of work. SCT automates much of this, but it requires human review of every converted object — automated conversions introduce subtle semantic differences that testing must catch.

Distribution Keys: The Most Consequential Decision

Teradata uses its own Primary Index concept to distribute rows across AMPs (Access Module Processors). Rows with the same Primary Index value land on the same AMP, which is Teradata’s mechanism for collocating joined data and avoiding redistribution cost. The Primary Index is defined at table creation and fundamentally affects query performance.

Amazon Redshift uses a Distribution Style (DISTKEY) for the same purpose. The mapping is not automatic and not always obvious:

Teradata Unique Primary Index (UPI) → Redshift DISTKEY on the same column(s) — works well when the column has high cardinality and joins frequently occur on it
Teradata Non-Unique Primary Index (NUPI) → Redshift DISTKEY — same approach, but watch for data skew if cardinality is low
Teradata Partitioned Primary Index (PPI) → Redshift DISTKEY + SORTKEY — PPI is Teradata’s partition mechanism; Redshift handles this with sort keys and zone maps

The critical difference: Teradata’s Primary Index is also its physical clustering mechanism. In Redshift, distribution and sort are independent concerns. A table that joins frequently on customer_id should have DISTKEY(customer_id), while its sort key might be order_date for range scan efficiency on time-based queries. Incorrectly mapping Teradata Primary Indexes to Redshift distribution styles is the single most common cause of post-migration performance problems.

A practical approach: export Teradata query logs (DBQL) for 30 days, identify the most resource-intensive queries by CPU seconds consumed, and derive distribution key choices from the join patterns in those queries. Do not simply copy Primary Index columns to DISTKEY without this analysis.

The BTEQ and FastExport Extraction Problem

Most Teradata installations use BTEQ scripts, FastExport, or TPT (Teradata Parallel Transporter) for bulk data extraction. These tools have no equivalents in AWS, and the extraction strategy must be redesigned.

For full table migrations, the recommended approach is:

Use Teradata JDBC through AWS Database Migration Service (DMS) for ongoing replication of tables up to ~1 TB
Use Teradata TPT export to S3 via an intermediary Linux host with the Teradata tools installed, then load from S3 to Redshift using COPY
For very large tables (>10 TB), partition the export by date range and parallelize the extraction

The Redshift COPY command is the correct bulk load mechanism — never use row-by-row INSERT for bulk loading. A well-structured COPY from Parquet files on S3 can ingest 100+ GB per hour per Redshift slice:

COPY my_schema.large_table
FROM 's3://migration-bucket/teradata-export/large_table/'
IAM_ROLE 'arn:aws:iam::123456789:role/RedshiftS3Role'
FORMAT AS PARQUET
REGION 'ca-central-1';

For guidance on broader migration orchestration, see our AWS DMS Database Migration post, which covers DMS configuration for heterogeneous migrations in detail.

Workload Management: From Teradata TASM to Redshift WLM

Teradata’s Workload Management (TASM) is a sophisticated system for classifying queries, assigning them to workload groups, and controlling resource allocation. Heavy users of TASM have complex classification rules based on user, application, query characteristics, and time of day.

Redshift Workload Management (WLM) achieves similar goals but with a different model. Redshift WLM uses query queues with memory allocation percentages and concurrency limits. Automatic WLM (the default) uses machine learning to adjust queue behaviour dynamically.

Migration teams that try to replicate their Teradata TASM configuration exactly in Redshift WLM almost always over-engineer it. Start with Automatic WLM and a small number of queues — typically one for ETL jobs, one for interactive queries, and one for superuser maintenance. Add complexity only when you observe specific resource contention problems under real workloads.

Hidden Costs That Derail Migration Business Cases

The migration business case typically compares Teradata licence and hardware costs against Redshift pricing. Teradata at scale costs $500,000–$2,000,000+ CAD per year in licence and appliance maintenance. Redshift with Reserved Instances for a comparable workload typically runs $80,000–$200,000/year — a compelling saving on paper.

What the business case often misses:

Data egress from Redshift. If downstream applications, BI tools, or data science platforms pull data out of Redshift to other regions or to the internet, egress costs ($0.09/GB) can add up quickly at scale.

Redshift Spectrum costs. If you use Redshift Spectrum to query data on S3, you pay $5 per TB scanned. This is cheap for occasional queries but expensive if you route high-volume ETL reads through Spectrum.

Query rewrite and testing labour. The SQL conversion and testing effort is almost always underestimated. Budget 0.5–1 engineering day per converted object for review and testing. A Teradata installation with 2,000 objects (tables, views, macros, procedures) represents 1,000–2,000 days of engineering effort at that rate — spread across a 12–18 month programme.

Performance tuning post-migration. Redshift does not automatically deliver Teradata-equivalent performance on migrated workloads. Budget 2–3 months of dedicated query tuning after cutover. See Redshift Cost Tuning for specific optimisation techniques.

Conclusion

A Teradata-to-Redshift migration is one of the most technically demanding programmes a data engineering team can undertake. The economics are compelling — often 60–80% cost reduction after accounting for all Teradata costs — but the technical complexity and organisational effort required to get there are routinely underestimated. Success requires a realistic SQL conversion assessment, a rigorous distribution key strategy based on actual query patterns, and a testing programme that validates not just query correctness but query performance.

Infra IT Consulting has supported Teradata migration programmes across Canada, the UK, and Africa. If you are in the planning stages or have a stalled migration, contact us for a technical assessment and realistic programme plan.

Cloud Migration & Cost Optimization

Talk to our team →

Teradata to Amazon Redshift Migration: What No One Tells You

The SQL Compatibility Gap Is Larger Than It Looks

Distribution Keys: The Most Consequential Decision

The BTEQ and FastExport Extraction Problem

Workload Management: From Teradata TASM to Redshift WLM

Hidden Costs That Derail Migration Business Cases

Conclusion

Related posts

Managing S3 Storage Costs: Lifecycle Policies and Intelligent-Tiering

AWS Reserved Instances vs. Savings Plans for Data Teams: A FinOps Decision Guide

AWS Cost Optimisation for Data Teams: 10 Tactics That Work

The SQL Compatibility Gap Is Larger Than It Looks

Distribution Keys: The Most Consequential Decision

The BTEQ and FastExport Extraction Problem

Workload Management: From Teradata TASM to Redshift WLM

Hidden Costs That Derail Migration Business Cases

Conclusion

Related posts

Managing S3 Storage Costs: Lifecycle Policies and Intelligent-Tiering

AWS Reserved Instances vs. Savings Plans for Data Teams: A FinOps Decision Guide

AWS Cost Optimisation for Data Teams: 10 Tactics That Work

We value your privacy