On-Premises to AWS Data Migration: A Practical Roadmap
Migrating data infrastructure from on-premises to AWS is the largest data engineering project most organisations undertake. It touches every operational database, every ETL pipeline, every reporting layer, and every downstream consumer β and it must be executed without disrupting the business operations those systems support. Done well, the result is a modern, elastic platform that reduces operational burden and total cost. Done poorly, it is a multi-year recovery project.
This post provides a practical roadmap based on real migration programmes β not a theoretical framework. The goal is to give data engineering leaders and CTOs a clear picture of what the work actually involves so they can plan, resource, and de-risk their migration accordingly.
Phase 1: Discovery and Assessment (Weeks 1-6)
The most critical and most underestimated phase is discovery. Organisations consistently undercount their data assets in the early stages of planning. A typical mid-size enterprise expects to migrate 50 pipelines and discovers, during systematic assessment, that 180 exist β many of them undocumented scripts running on CRON jobs nobody has touched in three years.
A rigorous assessment produces a complete inventory of:
Source systems: Every database instance with its engine, version, size (tables, rows, total storage), and replication topology. Include Oracle RAC clusters, SQL Server Always On groups, and any DB2 or Sybase systems that may have been forgotten. Use AWS Database Migration Service (DMS) Fleet Advisor to automate this discovery β it installs lightweight data collectors on source servers and produces a detailed assessment report.
ETL pipelines: Every scheduled job, Informatica workflow, SSIS package, custom Python/Perl script, and stored procedure that moves or transforms data. The inventory should capture: source, target, schedule, dependencies, business owner, and last modification date. Pipes with no documented owner and a last modification date over two years ago should be flagged for validation β they may be able to be retired rather than migrated.
Data volumes and change rates: Knowing that a database is 10 TB is not sufficient. You need daily change volume to plan replication windows, compression ratios to estimate S3 storage costs, and read/write patterns to design optimal AWS storage configurations.
Downstream consumers: Every BI tool, application, and API that reads from the systems being migrated. These consumers represent your rollback risk β if the migrated system does not meet their latency or data quality expectations, they revert to the old system.
Phase 2: Architecture Design (Weeks 4-10)
Architecture design overlaps with discovery β you refine the design as discovery reveals complexity. The key architectural decisions:
Target database mapping: Not every on-premises database should migrate to the same AWS service. A common mapping:
| Source | Recommended AWS Target | Notes |
|---|---|---|
| Oracle OLTP | Amazon Aurora PostgreSQL | Significant schema conversion work |
| SQL Server OLTP | Amazon RDS SQL Server or Aurora PostgreSQL | RDS SQL Server has licence cost; Aurora requires app changes |
| Teradata / Netezza | Amazon Redshift | Schema and query conversion needed |
| Hadoop / Hive | AWS Glue + S3 + Athena or Amazon EMR | Depends on workload type |
| DB2 | Amazon RDS PostgreSQL | Schema conversion via AWS SCT |
The AWS Schema Conversion Tool (AWS SCT) automates a significant portion of schema conversion for heterogeneous migrations. It converts DDL, stored procedures, functions, and triggers, and flags any objects that require manual intervention. On typical Oracle-to-PostgreSQL migrations, SCT can automate 60-80% of conversions β the remaining 20-40% requires engineering effort and is where most schedule risk lives.
Network architecture: Large-scale migrations require dedicated bandwidth. AWS Direct Connect provides a private, low-latency connection from your data centre to AWS with consistent throughput. A 10 Gbps Direct Connect circuit can transfer approximately 100 TB per day at full utilisation β sufficient for most initial loads. For purely one-time data transfer of very large datasets (petabyte scale), AWS Snowball Edge devices are faster and cheaper than network transfer.
Data lake integration: Most organisations migrating from on-premises are also taking the opportunity to introduce a data lake architecture they did not have previously. The on-premises Hadoop to AWS migration pattern is closely related and worth reviewing if you have existing Hadoop infrastructure.
Phase 3: Pilot Migration (Weeks 8-14)
Before migrating any production system, execute a complete end-to-end pilot with a representative but non-critical system. The pilotβs purpose is not to prove the concept works β it is to discover the unknowns that will affect every subsequent migration.
A good pilot candidate: a mid-complexity operational database that feeds reporting but not customer-facing applications, with an identifiable set of downstream consumers who can participate in validation.
The pilot migration sequence:
- Schema conversion using AWS SCT. Document every manual intervention required.
- Full load using AWS DMS. Measure actual throughput vs. theoretical capacity.
- Ongoing replication using DMS Change Data Capture (CDC). Validate replication lag under production-representative load.
- Consumer validation: Run all downstream ETL pipelines and BI reports against the migrated target. Compare row counts and aggregated metrics against the source.
- Performance benchmarking: Run the top 20 most expensive queries from your current system against the migrated target. Investigate any that regress beyond 20%.
- Cutover dry run: Simulate the cutover sequence β stop replication, switch connection strings, restart consumers β with full timing measurements. This is your go-live playbook.
Document everything that goes wrong during the pilot and the resolution. These will be the same issues you encounter on every subsequent migration, and having documented resolutions accelerates the programme significantly.
Phase 4: Migration Waves (Months 3-12+)
Production migrations are executed in waves, ordered by risk and dependency. A typical wave structure:
Wave 1 β Non-production environments: Migrate dev, test, and staging environments first. This gives the development and QA teams experience with the new platform and surfaces application-layer changes needed before production cutover.
Wave 2 β Reporting and analytics systems: Migrate data warehouses and reporting databases. These have the highest tolerance for maintenance windows and the most flexibility in cutover timing. They also validate your DMS pipeline configuration and target performance under analytics workloads.
Wave 3 β Low-criticality operational databases: Databases that support operational processes but have tolerable outage windows (internal tooling, reference data, historical archives).
Wave 4 β Critical operational databases: Customer-facing transactional databases with the strictest SLAs. By Wave 4, your team has executed the migration playbook multiple times and the process is well understood.
For each wave, the execution sequence is:
- Create DMS replication instance and configure source and target endpoints
- Run DMS full load task (validate row counts before proceeding)
- Switch DMS to CDC mode and allow replication lag to reach near-zero
- Schedule cutover window (typically off-peak, with at least 4-hour buffer)
- Stop application writes to source, allow final CDC transactions to apply
- Validate target row counts and spot-check data consistency
- Switch application connection strings to target (via AWS Systems Manager Parameter Store for centralised configuration management)
- Monitor for 2 hours before formally closing the source
Managing the Risk: Rollback Strategy
Every migration must have a documented, tested rollback procedure. The rollback window is the time after cutover during which reverting to the source system is operationally feasible. Key rollback components:
Reverse replication: Configure DMS CDC in the reverse direction β from AWS target back to on-premises source β before cutover. This runs in parallel, at low cost, and means that if you need to roll back, data written to the AWS target during the cutover window can be replicated back to the source.
Dual-write period: For high-criticality systems, run a brief period (1-4 hours) where the application writes to both source and target simultaneously. This eliminates the data-consistency risk of the cutover instant.
Consumer rollback plan: Each downstream consumer should have documented steps to revert its connection to the on-premises source. These steps should be tested during the pilot dry run.
Rollback decision authority: Define in advance who can authorise a rollback, what the decision criteria are (acceptable error rates, replication lag thresholds), and how long after cutover a rollback remains operationally viable.
AWS Services to Know for Data Migration
Beyond DMS and SCT, several AWS services play specific roles in data migration programmes:
- AWS DataSync: High-performance file and object transfer between on-premises storage and S3. Handles checksums, bandwidth throttling, and retry logic automatically. Preferable to custom rsync scripts for large file migration.
- AWS Glue: Schema conversion and data transformation during the migration, particularly for denormalising or restructuring data that changes format in the new architecture.
- AWS Systems Manager Parameter Store: Centralised storage for connection strings, making the application-layer switch from old to new endpoints a parameter update rather than a deployment.
- Amazon CloudWatch: Pipeline monitoring during migration, including DMS replication lag metrics and custom data validation results.
What Migration Projects Actually Cost
The most common source of migration project overruns is underestimating the three hidden cost categories:
Schema conversion labour: DMS and SCT handle 60-80% of heterogeneous schema conversion automatically. The remaining 20-40% is manual engineering work that requires database expertise in both the source and target platform. For a large Oracle-to-Redshift migration, this can represent 6-12 engineer-weeks of work.
Application changes: Applications that embed source-specific SQL syntax (Oracle hints, SQL Server-specific functions, Teradata BTEQ syntax) require application code changes in addition to database migration. These changes require application developer time, testing, and deployment β often outside the data engineering teamβs control.
Validation and testing: Comprehensive data validation between source and target is essential but time-consuming. Automating row count comparisons, aggregated metric checks, and referential integrity validation is a significant engineering investment that should be budgeted explicitly.
Your Migration Programme Starts with the Inventory
The single most valuable thing you can do before committing to a migration timeline is complete the discovery and assessment phase. Organisations that skip this step consistently underestimate scope, timeline, and cost. Those that complete it systematically are in a position to plan a realistic programme with manageable risk.
Infra IT Consulting has executed data migration programmes for organisations across Canada and internationally, moving workloads from Oracle, SQL Server, Teradata, and Hadoop environments to AWS. Contact us to discuss your migration landscape and get a realistic assessment of scope and timeline.
Related posts
Using AWS Spot Instances for Cost-Effective Data Processing
Read more Cloud Migration & Cost OptimizationAWS Cost Optimisation for Data Teams: 10 Tactics That Work
Read more Cloud Migration & Cost OptimizationTeradata to Amazon Redshift Migration: What No One Tells You
Read moreBook a free 30-minute consultation to discuss your data engineering and analytics needs.
Talk to our team β