Data Migration to the Cloud — A Field-Tested Playbook (No Drama, Just Data)

White Paper

Data Migration to the Cloud

A field-tested playbook for moving application data without breaking the business (or your sleep). No drama, just data.

Author: Iain Toolin

Date: August 2025

Version: 1.0

Executive Summary

This white paper sets out a practical, governance-safe approach to migrating application data to the cloud. It focuses on outcomes over theatre: discover what exists, design before you build, rehearse until cutover is boring, and reconcile until the auditors are smiling. The tone is straight-talking with professional wryness — the kind that keeps meetings short and results long-lived.

Goal: Put the right data in the right cloud in a way that is legal, auditable, and cost-aware.
Method: Discover → Design → Prepare → Migrate → Cutover → Stabilise.
Guardrails: Clear ownership rules, privacy and security by default, automated quality controls, and rollback that actually rolls back.

Bottom line: This isn’t alchemy. It’s carpentry — measure, cut, sand, fit. If anyone suggests skipping reconciliation, politely remove their scissors from the change freeze.

return-2-toc

1) Purpose & Scope

Audience: CIOs, programme leads, data architects, platform engineers, and compliance teams who like a plan that survives contact with reality.

In scope: Application data migrations to public cloud platforms; patterns from lift-and-shift to refactor; selective and time-sliced extraction for divestments; assurance and handover.

Out of scope: Full application modernisation, non-data infrastructure specifics, and anything that requires a séance with a vendor EULA.

↩return-2-toc

2) Approach Overview

The delivery spine is simple: Discover → Design → Prepare → Migrate → Cutover → Stabilise. Measure twice, cut once, reconcile always.

Discover

Non-technical: Inventory apps and their data like a house move: what’s coming, what’s going to storage, what’s going to charity.

Technical: Catalogue domains, systems, tables, SLAs, RPO/RTO, PII/PHI, CRUD patterns, CDC, and lineage hot spots.

Design

Non-technical: Decide what to lift, re-platform, refactor, split, or retire. Agree how to prove success.

Technical: Define patterns, target stores, SCD, data contracts, keys, reconciliation, and security by design.

Prepare

Non-technical: Fix the ugly stuff early so it doesn’t become live ugly.

Technical: Profiling, DQ fixes, reference alignment; build landing zones, IAM, pipelines, and IaC.

Migrate

Non-technical: Rehearse until boring — then do it for real.

Technical: Dry runs, mock runs, CDC to shrink the window, and automated validation using counts, sums, checksums, and KPIs.

Cutover

Non-technical: Calm hands, clear comms, cake afterwards.

Technical: Freezes, final delta, endpoint flips, cache warming, health checks, and optional dual-run.

Stabilise

Non-technical: Prove it works, retire the scaffolding, hand to BAU.

Technical: Post-cutover validation, tuning, alerts, legacy retirement, updated lineage, and BAU runbooks.

↩return-2-toc

3) Governance & Guardrails

Ownership rules: codified, time-sliced where needed, versioned and auditable.
Security & privacy: KMS/CMKs, encryption at rest and in transit, masking or tokenisation for lower environments, least-privilege IAM.
Quality gates: null percentages, referential integrity, ranges, regex masks — automated, not hopeful.
Cost controls: budgets, tagging, alerts, auto-suspend and scaling, with FinOps review baked into BAU.

Reality check: If policies are “still being finalised”, the data model earns its crust — effective dates, provenance, and rules that stand up to audit.

↩Contents

4) Phased Delivery (What We Actually Do)

Phase A — Discover

Non-technical: Inventory apps and data; decide what travels, what is stored, and what is retired.

Technical: Catalogue domains, systems, tables, volumes, SLAs; map CRUD, CDC, and API limits; identify master, transactional, and analytical stores; sketch lineage.

Outputs: Data inventory, sensitivity tags, migration candidates with effort and benefit.

Phase B — Design

Non-technical: Choose the fate of each dataset; agree the definition of done.

Technical: Select patterns, define target storage, partitioning and retention, contracts and schemas, SCD and key handling, reconciliation, and security design.

Outputs: Migration design per app or dataset, acceptance criteria, runbooks, and reconciliation packs.

Phase C — Prepare

Non-technical: Remove surprises; keep the good ones for birthdays.

Technical: Profiling and DQ fixes, reference alignment, dedupe, landing zones, IAM, Private Link, secrets, pipelines, and IaC.

Outputs: Cleaned data slices, ready pipelines, non-prod environments, and a rehearsal plan.

Phase D — Migrate

Non-technical: Rehearse, rehearse, rehearse. Then do it for real.

Technical: Dry runs to mock runs to dress rehearsal; incremental syncs via CDC; automated validation using counts, sums, distincts, checksums, and KPIs.

Outputs: Signed rehearsal results, tuned timings, and tested rollback.

Phase E — Cutover

Non-technical: Communicate the window and contingency; nominate who brings cake.

Technical: Freeze where needed, run final delta, flip endpoints, warm caches, verify health, and use dual-run where sensible.

Outputs: Go/No-Go record, cutover log, first-day checks, and on-call rota.

Phase F — Stabilise

Non-technical: Prove it works, document it, hand it to BAU.

Technical: Post-cutover validation, tuning, cost guardrails, legacy retirement, updated lineage and catalogue, and BAU handover.

Outputs: Signed acceptance, completed decommission plan, and named BAU ownership.

↩toc wo

5) Data Disposition Catalogue

Option	Non-technical description	Technical description	Example	Data pitfalls
Lift & Shift (Live)	Move it as-is, change the postcode.	Rehost database to a managed service; keep schema; minimal refactor.	VM-based app DB → managed Postgres.	Platform constraints, collation mismatches, timezone surprises.
Re-platform	Same furniture, better house.	Self-managed DB → cloud-native equivalent.	On-prem Oracle → Cloud SQL or Autonomous.	Feature gaps and driver incompatibilities.
Refactor	Teach the data new tricks.	Break monolith into domain stores, events, and lakehouse layers.	Orders → Bronze/Silver/Gold + CDC.	Key management and SCD complexity.
Selective Split	Only take what you’re entitled to.	Time-slice or entity-slice extraction using explicit rules.	Divestment by company code + effective dates.	Ownership ambiguity and late-arriving facts.
Archive & Retire	Box it, label it, keep it legal.	WORM storage plus an index; searchable offline.	Closed app with 7-year retention.	Poor indexing creates discovery pain.
Virtualise (Interim)	Read there, compute here.	External tables or views over remote storage.	Data sharing to analytics.	Latency and cost surprises.
Replace (Greenfield)	New system, migrate what matters.	Canonicalise, map, then load into target SaaS.	Legacy CRM → Dynamics 365.	Loss of historic semantics.

↩Contents

6) Migration Patterns

Full Snapshot + Cutover — Small or medium data, low change rate. Simple, but with a larger window.
Snapshot + CDC Catch-up — High change rate, tight window. Practice-friendly and precise.
API-Led Trickle — SaaS with decent APIs. Idempotent upserts; mind the throttling.
File-Based Batch — Big, predictable nightly windows. Checksum everything.
Event Rebuild — When you have a reliable event log. Elegant, if the events actually exist.
Dual-Run Bridge — For risky systems. Write to old and new, reconcile, then switch off old writes.

↩Contents

7) Case Example — The “Janey Job-Hopper” Problem

Scenario (plain English): Janey sells a blue widget on 2 March while employed by Company B; the customer pays on 30 March after Janey has transferred to Company C. In a divestment, Company B gets both the sale and the related payment because the sale was initiated during B-employment.

Implementation (technical):

Model Employment(effective_from, effective_to, company_code) and Transaction(initiated_ts, completed_ts, employee_id).
Ownership rule: CASE WHEN initiated_ts BETWEEN emp.effective_from AND emp.effective_to THEN emp.company_code END
Version the rule with a policy ID for audit and store provenance.
Reconcile totals by company before and after the split, with agreed variance thresholds.

↩Contents

8) Controls & Assurance

Control	What it proves	How
Reconciliation	No loss or duplication.	Row counts, sums, distinct business keys, and UI parity sampling.
Lineage	Explainability.	Data catalogue, column-level lineage, and commit hashes.
Quality gates	Fitness to load.	Null percentages, range checks, referential checks, and regex masks.
Security	Privacy by design.	KMS, CMKs, HSMs, least-privilege IAM, and masked lower environments.
Cost guardrails	Spend under control.	Budgets, alerts, auto-suspend, scaling, tagging, and FinOps review.

↩Contents

9) Cutover Playbook (T-Timeline)

T-14 days: Final mock. Sign timings, rollback, and owners.
T-3 days: Freeze catalogue changes; pin versions.
T-1 day: CDC lag within threshold; warm caches.
T-0: Freeze writes where needed, run final delta, flip endpoints, run smoke tests, and publish the green board.
T+1–7: Dual-run where used, reconcile daily, release old writes, and transition to BAU.

↩Contents

10) Roles & Responsibilities (Sane RACI)

Data Lead: scope, standards, acceptance, arbitration.
App Owner: source truth, test scenarios, sign-off.
Security / Privacy: DPIA, keys, masking, access.
Platform: accounts, networks, policies, observability.
Engineers: pipelines, tests, IaC, runbooks.
Ops / Support: steady-state, alerts, and who gets called when things go bump.

↩Contents

11) Worked Mini-Examples

1) Payroll to Cloud DW (Snapshot + CDC)

Non-technical: First copy everything, then keep it in sync until payday.

Technical: Full extract to staged parquet; Debezium, Kafka, or CT-based CDC; SCD2 for employees; row-level security by organisation.

Checks: Employee counts per org, net pay totals, and variance below agreed tolerance.

2) Legacy CRM → SaaS (API-Led Replace)

Non-technical: Move customers with their history; skip the 2008 zombie leads.

Technical: Canonical IDs, throttled APIs, idempotency keys for upserts, and WORM archive for the zombies.

Checks: Customers by segment, opportunities by stage, and random UI spot checks.

3) Manufacturing Orders to Lakehouse (Refactor)

Non-technical: Turn the shed of CSVs into a tidy pantry: raw shelf, cleaned shelf, serving shelf.

Technical: Bronze to Silver to Gold; SCD2 on product and plant; partition by order_date; Z-order on customer_id.

Checks: Order totals by month and plant, plus join completeness across dimensions.

↩Contents

12) Definition of Done

All acceptance tests pass: counts, sums, and sensible samples.
Lineage and catalogue updated; DPIA complete; least-privilege access verified.
Runbooks, alerts, and SLOs in place; budget alarms set.
Legacy feeds turned off; archive searchable; owners named.

↩Contents

13) Risks & Antidotes

Risk	Why it bites	Antidote
Ownership ambiguity	Divestments, transfers, late payments.	Time-sliced rules, policy IDs, legal sign-off, audit columns.
API throttling	SaaS protects itself.	Back-off, chunking, idempotency.
Schema drift	Source keeps changing.	Contract tests, schema registry, versioned pipelines.
Privacy leakage	Lower environments get real data.	Masking, tokenisation, or synthetic datasets.
Cost sprawl	“Just one more cluster”.	Budgets, auto-suspend, tagging, FinOps review.

↩Contents

14) Reference Architecture (Mental Model)

This diagram shows the delivery spine from source applications through landing, processing, curation, target loading, and archival. It reflects how data is defined, moved, and proven in practice.

Practitioner Narrative: Producers and applications feed a raw landing zone via snapshot, CDC, or API. Data is validated, cleaned, and conformed through processing, then curated into Gold structures or loaded into target systems. Time-sliced archival supports retention and controlled separation.

↩Contents

15) Appendix — Glossary

BAU: Business as usual — steady-state operations.
CDC: Change Data Capture — deltas since the last snapshot.
DPIA: Data Protection Impact Assessment.
Lakehouse: Data lake plus warehouse traits, often expressed as Bronze, Silver, and Gold.
RACI: Responsible, Accountable, Consulted, Informed.
SCD: Slowly Changing Dimension — Type 2 tracks history.
SLA / RPO / RTO: Service and recovery objectives defining availability and restore posture.
WORM: Write Once, Read Many — immutable storage.

↩Contents

↑ Contents One thing to watch: if this is going into a Webador HTML block rather than a standalone HTML page, the , , and <meta> parts still won’t do anything.

Data Migration to the Cloud — A Field-Tested Playbook (No Drama, Just Data)

White Paper

Data Migration to the Cloud

A field-tested playbook for moving application data without breaking the business (or your sleep). No drama, just data.

Author: Iain Toolin

Date: August 2025

Version: 1.0

Executive Summary

Goal: Put the right data in the right cloud in a way that is legal, auditable, and cost-aware.
Method: Discover → Design → Prepare → Migrate → Cutover → Stabilise.
Guardrails: Clear ownership rules, privacy and security by default, automated quality controls, and rollback that actually rolls back.

Bottom line: This isn’t alchemy. It’s carpentry — measure, cut, sand, fit. If anyone suggests skipping reconciliation, politely remove their scissors from the change freeze.

1) Purpose & Scope

Audience: CIOs, programme leads, data architects, platform engineers, and compliance teams who like a plan that survives contact with reality.

In scope: Application data migrations to public cloud platforms; patterns from lift-and-shift to refactor; selective and time-sliced extraction for divestments; assurance and handover.

Out of scope: Full application modernisation, non-data infrastructure specifics, and anything that requires a séance with a vendor EULA.

Data Migration to the Cloud

Executive Summary

return-2-toc

1) Purpose & Scope

2) Approach Overview

Discover

Design

Prepare

Migrate

Cutover

Stabilise

3) Governance & Guardrails

4) Phased Delivery (What We Actually Do)

Phase A — Discover

Phase B — Design

Phase C — Prepare

Phase D — Migrate

Phase E — Cutover

Phase F — Stabilise

5) Data Disposition Catalogue

6) Migration Patterns

7) Case Example — The “Janey Job-Hopper” Problem

8) Controls & Assurance

9) Cutover Playbook (T-Timeline)

10) Roles & Responsibilities (Sane RACI)

11) Worked Mini-Examples

1) Payroll to Cloud DW (Snapshot + CDC)

2) Legacy CRM → SaaS (API-Led Replace)

3) Manufacturing Orders to Lakehouse (Refactor)

12) Definition of Done

13) Risks & Antidotes

14) Reference Architecture (Mental Model)

15) Appendix — Glossary

Data Migration to the Cloud

Executive Summary

1) Purpose & Scope

2) Approach Overview

Discover

Design

Prepare

Migrate

Cutover

Stabilise

3) Governance & Guardrails

4) Phased Delivery (What We Actually Do)

Phase A — Discover

Phase B — Design

Phase C — Prepare

Phase D — Migrate

Phase E — Cutover

Phase F — Stabilise

5) Data Disposition Catalogue

6) Migration Patterns

7) Case Example — The “Janey Job-Hopper” Problem

8) Controls & Assurance

9) Cutover Playbook (T-Timeline)

10) Roles & Responsibilities (Sane RACI)

11) Worked Mini-Examples

1) Payroll to Cloud DW (Snapshot + CDC)

2) Legacy CRM → SaaS (API-Led Replace)

3) Manufacturing Orders to Lakehouse (Refactor)

12) Definition of Done

13) Risks & Antidotes

14) Reference Architecture (Mental Model)

15) Appendix — Glossary

Click on the picture to hyperlink to topics.