Infra IT Consulting logo Infra ITC
Cloud Migration & Cost Optimization migrationchecklistaws

The AWS Data Migration Checklist: 50 Things to Verify Before Go-Live

By Infra IT Consulting · · 11 min read

Migration projects fail at go-live more often than at any other point. The weeks of discovery, architecture design, and pipeline development create momentum and confidence — and then the cutover weekend arrives with a critical step nobody thought to verify until it is too late. A systematic checklist is not a substitute for good architecture or experienced engineers, but it is a reliable mechanism for ensuring that the work leading up to cutover did not leave unverified assumptions that will surface as incidents.

This checklist is organised into eight categories and is intended to be completed in the 2-4 weeks before production cutover, not on the day. Items that surface late in this process should be resolved before scheduling a go-live date.

Category 1: Data Completeness and Integrity (Items 1-8)

1. Row count validation across all migrated tables

For every table in scope, compare source row counts to target row counts. Discrepancies of any magnitude require investigation before proceeding. Use a dedicated validation query, not a manual estimate.

-- Source system (SQL Server example)
SELECT TABLE_SCHEMA, TABLE_NAME, SUM(row_count) AS row_count
FROM sys.dm_db_partition_stats
WHERE index_id IN (0, 1)
GROUP BY TABLE_SCHEMA, TABLE_NAME
ORDER BY TABLE_SCHEMA, TABLE_NAME;

-- Target system (Redshift)
SELECT schemaname, tablename, tbl_rows
FROM svv_table_info
ORDER BY schemaname, tablename;

2. Checksum validation on high-priority tables

Row counts match but data is wrong is a real scenario — caused by bugs in transformation logic, encoding issues in character data, or timestamp timezone handling errors. For your top 10-20 most critical tables, compute and compare a numeric checksum (SUM of a stable numeric column, or MD5 of a sorted row sample).

3. Referential integrity verification

Confirm that foreign key relationships are intact in the target. AWS DMS does not enforce foreign keys during loads — it loads tables in parallel, which can result in child records without parent records in the target if the load sequence was not managed correctly.

4. NULL and default value comparison

Source systems handle NULLs and defaults differently. Oracle stores empty strings as NULL; PostgreSQL does not. SQL Server DATETIME has a different range and precision than PostgreSQL TIMESTAMP WITH TIME ZONE. Verify that these differences were handled correctly in schema conversion.

5. Data type mapping validation

Review the AWS Schema Conversion Tool (SCT) conversion report for every table. For each column flagged as requiring manual review, confirm that the manual resolution was implemented and tested. Pay particular attention to: CLOB/BLOB vs. VARCHAR/BYTEA conversions, TIMESTAMP WITH TIME ZONE handling, and NUMERIC precision for financial values.

6. Historical data completeness

Confirm that the full historical data range was loaded, not just current data. Check the min and max date values in date-keyed tables against the expected historical depth. If you are migrating a 5-year-old database, there should be records from 5 years ago.

7. Duplicate detection

DMS replication can produce duplicates under certain failure and retry conditions. Run duplicate checks on primary key columns for all tables:

-- Detect duplicates after DMS load
SELECT primary_key_column, COUNT(*) AS duplicate_count
FROM target_schema.target_table
GROUP BY primary_key_column
HAVING COUNT(*) > 1;

8. Character encoding validation

Confirm that special characters (accented characters in French-language Canadian data, currency symbols, special business characters) survived the migration without corruption. Spot-check records with known special character content from the source.

Category 2: Performance Baseline (Items 9-16)

9. Execute the top 20 most expensive queries on the target

Pull the most expensive queries from your source system’s query monitoring (SQL Server DMVs, Oracle AWR, PostgreSQL pg_stat_statements). Run each query against the migrated target. Any query that runs more than 2x slower on the target requires investigation before go-live.

10. Verify Redshift distribution and sort key effectiveness

-- Check table skew (data distribution across nodes)
SELECT trim(name) AS table_name,
       max(num_values) AS max_slice_rows,
       min(num_values) AS min_slice_rows,
       CASE WHEN min(num_values) = 0 THEN 'SKEWED'
            WHEN max(num_values) / nullif(min(num_values), 0) > 4 THEN 'SKEWED'
            ELSE 'OK' END AS distribution_health
FROM svv_diskusage
GROUP BY name
ORDER BY max_slice_rows DESC
LIMIT 20;

Tables with skew ratios above 4:1 will have poor query performance due to uneven compute utilisation. Investigate and correct distribution key choices before go-live.

11. BI dashboard load time validation

Open each business-critical dashboard in your BI tool (QuickSight, Tableau, Power BI) connected to the migrated target. Confirm that all dashboards load within acceptable time thresholds. Dashboard queries often expose missing indexes or materialised views that were present in the source but not recreated in the target.

12. Concurrent query performance test

Simulate realistic concurrent user load against the target. Use AWS Load Testing tools or a simple concurrent execution script. A system that performs acceptably under single-user testing can fail badly under 20 concurrent analysts running dashboards simultaneously.

13. ETL pipeline performance validation

Run all migrated ETL pipelines against the target and measure execution time. Pipeline runtimes should be within 150% of source system runtimes. Significant regressions indicate missing indexes, suboptimal query plans, or WLM configuration issues.

14. Verify Redshift query plan for key queries

-- Review query execution plan
EXPLAIN SELECT SUM(gross_revenue)
FROM analytics.fact_sales f
JOIN analytics.dim_customer c ON f.customer_key = c.customer_key
WHERE c.country_code = 'CA'
  AND f.date_key BETWEEN 20240101 AND 20240131;

Look for DS_BCAST_INNER (broadcast join) on large tables — this indicates a distribution key mismatch. Look for missing sort key usage on commonly filtered columns.

15. Network latency measurement to application tier

Measure round-trip latency from application servers to the new AWS RDS or Redshift endpoint. Applications that were co-located with on-premises databases and are now talking to AWS over Direct Connect or internet may experience latency changes that affect responsiveness.

16. Peak load simulation

Using production query patterns (exported from source monitoring), replay the peak load period against the target. End-of-month processing, quarterly reports, and daily batch windows represent peak load scenarios that must be verified at realistic scale.

Category 3: Security and Access Control (Items 17-24)

17. IAM role and policy audit

Verify that every IAM role used by migrated pipelines has the minimum permissions required. IAM roles should follow least-privilege — if a Glue job only reads from S3 and writes to Redshift, its role should have no permissions beyond those specific actions on those specific resources.

18. AWS Secrets Manager rotation configured

Database credentials stored in AWS Secrets Manager should have automatic rotation enabled. Confirm rotation is configured and test it: trigger a manual rotation and verify that pipelines using the secret continue to function after rotation.

19. Network Security Group (NSG) and VPC validation

Confirm that RDS and Redshift instances are in private subnets with no public accessibility. Verify that security group rules allow inbound connections only from authorised sources (application servers, ETL clusters, VPN ranges) and block all other traffic. Test from an unauthorised IP that the connection is refused.

20. Encryption at rest verification

Confirm that all RDS instances, Redshift clusters, and S3 buckets have encryption at rest enabled. For Canadian data sovereignty requirements, verify that AWS KMS keys are regional (ca-central-1) and customer-managed where required by your data classification policy.

21. Encryption in transit validation

Verify SSL/TLS connections are enforced for all database connections. For RDS PostgreSQL, the rds.force_ssl parameter should be set to 1. For Redshift, SSL mode should be verify-full in JDBC/ODBC connection strings.

22. Data classification and Lake Formation permissions

If you have implemented AWS Lake Formation for data lake access control, verify that the permission hierarchy matches your intended access control design. Test that analyst users can access the tables they should and cannot access tables they should not.

23. CloudTrail logging confirmed active

AWS CloudTrail should be logging all management API calls and data events for the migrated services. Verify that the CloudTrail trail is active and delivering logs to S3 in the correct account.

24. VPC Flow Logs enabled

Enable VPC Flow Logs for the VPC hosting migrated databases. Flow logs are essential for post-incident investigation if unexpected access patterns occur after cutover.

Category 4: Replication and CDC Validation (Items 25-30)

25. CDC lag measurement under production-equivalent load

With DMS CDC replication active, apply production-equivalent write load to the source and measure replication lag. Replication lag should be under 60 seconds under typical load and under 5 minutes at peak. Sustained lag above these thresholds indicates that the DMS replication instance is undersized.

26. Transaction boundary integrity test

Apply a multi-statement transaction to the source and verify that all statements arrive on the target atomically. DMS should replicate transactions as atomic units — but this should be explicitly tested for critical transaction types.

27. DDL replication handling

If your migration requires ongoing DDL replication (schema changes applied to the source should propagate to the target), test this explicitly. Apply an ALTER TABLE ADD COLUMN to the source and verify it appears on the target without breaking subsequent DML replication.

28. Binary log / CDC prerequisite verification (pre-cutover)

Confirm that source database CDC prerequisites remain configured throughout the migration period. MySQL: binlog_format=ROW, binlog_retention_hours >= 24. Oracle: supplemental logging enabled. SQL Server: CDC or transaction log backup with minimum 24-hour retention.

29. DMS task error monitoring

Review DMS task logs for any errors or warnings accumulated since replication began. Investigate any table-level errors that may indicate partial replication failures on specific objects.

30. Replication lag monitoring alarm active

Confirm that a CloudWatch Alarm is configured on the DMS CDCLatencySource metric. The alarm should fire if replication lag exceeds 5 minutes, routing to an SNS topic that pages the migration team.

Category 5: Application and Integration (Items 31-36)

31. Connection string verification in all applications

Inventory every application, ETL tool, and reporting tool that connects to the migrated databases. Confirm that each has been updated to use the new AWS endpoint, is using the correct port, and is connecting via SSL. Use AWS Systems Manager Parameter Store to manage connection strings centrally, reducing the risk of missed updates.

32. Connection pool configuration validation

Application connection pools configured for on-premises latency may behave differently with AWS endpoints. Validate connection pool settings (minimum connections, maximum connections, connection timeout, idle timeout) against the new endpoint’s characteristics. RDS Proxy can be used to manage connection pooling between applications and RDS instances.

33. Application smoke test on the migrated target

Execute a full functional smoke test of each application against the migrated database before cutover. This should cover create, read, update, and delete operations for the primary user flows.

34. API and downstream system notification plan

Identify every internal API and downstream system that consumes data from the migrated databases. Confirm that each team has been notified of the cutover date and has a point of contact for the migration team during the cutover window.

35. ODBC/JDBC driver compatibility

Confirm that the JDBC and ODBC drivers used by BI tools and ETL platforms are compatible with the target database version. Particularly important for heterogeneous migrations (e.g., Oracle to PostgreSQL) where driver behaviour differs.

36. Stored procedure and function validation

Execute the stored procedures and functions used by applications against the migrated target. Compare output to source system for a representative set of inputs. For heterogeneous migrations, this is where schema conversion errors most commonly surface.

Category 6: Monitoring and Alerting (Items 37-42)

37. CloudWatch dashboards configured for migrated services

Create CloudWatch dashboards showing: RDS/Redshift CPU and memory utilisation, database connections, query latency, storage consumption, replication lag (if applicable), and ETL pipeline execution status.

38. Critical alarms configured and tested

Configure and test CloudWatch Alarms for: database CPU > 80% for 10 minutes, storage usage > 80%, replication lag > 5 minutes, ETL pipeline failure, and target-specific metrics (Redshift WLM queue depth, RDS read replica lag).

39. AWS Health notifications configured

Subscribe to AWS Health notifications for the services and regions used by the migrated workload. AWS Health events (maintenance windows, service impairments) that affect your services should page your on-call team.

40. Log retention and alerting for database error logs

Configure CloudWatch Logs agent on RDS to stream PostgreSQL error logs. Set up a CloudWatch Logs metric filter and alarm for ERROR-level messages that would indicate post-cutover issues.

41. Query performance monitoring query available

Have the monitoring queries ready to execute during and after cutover to assess real-time performance. For Redshift, svl_query_summary and svv_query_state. For RDS PostgreSQL, pg_stat_activity and pg_stat_statements.

42. On-call rotation and escalation path documented

Confirm that the migration team has an on-call rotation for the 72 hours post-cutover, with a documented escalation path to AWS Support (confirm your support plan tier covers 24/7 access to engineers for production incidents).

Category 7: Rollback Readiness (Items 43-47)

43. Reverse replication configured and tested

Configure a DMS task replicating from the AWS target back to the on-premises source. Start it and confirm it is running without errors before cutover. This is your rollback data-synchronisation mechanism.

44. Rollback decision criteria documented

Define in writing: what specific conditions trigger a rollback decision, who has authority to call the rollback, and at what point after cutover rollback is no longer operationally feasible. Share this with all stakeholders before the cutover window.

45. Application rollback tested

Test the full rollback sequence: switch applications back to the on-premises connection strings and verify application functionality. This test should be done in staging before the production cutover date.

46. Communication plan for rollback scenario

Prepare pre-written communications for a rollback scenario: internal team notification, stakeholder notification, and if applicable, external user notification. Having these drafted in advance prevents communication delays during a stressful rollback event.

47. Source system freeze plan

Define and communicate the plan for preventing writes to the source system during the cutover window. This may involve application maintenance mode, database read-only mode, or scheduling during a period of naturally low write volume.

Category 8: Post-Cutover Verification (Items 48-50)

48. Data freshness check 24 hours post-cutover

24 hours after cutover, verify that all scheduled ETL pipelines have completed successfully and that data freshness metrics are within their SLA windows. The data freshness and SLA monitoring practices should be active from the first pipeline run on the new system.

49. Cost monitoring review 72 hours post-cutover

Review the AWS Cost Explorer forecast 72 hours after cutover. Unexpected cost increases indicate misconfigured services — a Glue job running continuously, an S3 Transfer Acceleration endpoint left enabled, or a DMS replication instance that was not terminated after CDC cutover.

50. 30-day retrospective scheduled

Schedule a formal retrospective 30 days post-cutover with the migration team. Review what went according to plan, what was discovered during cutover, any post-cutover incidents, and the lessons applicable to subsequent migration waves. For a multi-wave programme, the retrospective output directly improves the execution of every subsequent wave.

The Checklist Is Not the Work — It Is the Verification

Completing this checklist does not make a migration successful. The work that makes it successful is the discovery, architecture, and pipeline engineering that precedes the checklist. But organisations that attempt go-live without a systematic verification process consistently discover items 10, 23, and 36 (and others) as post-cutover incidents rather than as pre-cutover findings.

For teams planning an AWS data migration, Infra IT Consulting provides end-to-end migration programme management, including pre-cutover verification support, on-site (or virtual) go-live assistance, and post-cutover hypercare. Our migration experience spans on-premises to AWS migrations, legacy ETL modernisation, and heterogeneous database platform conversions. Contact us to discuss how we can de-risk your migration go-live.

Related posts