Main Cloud Databases — Quick Reference

A concise map of leading cloud databases with plain-English descriptions, technical notes, common use cases, and data-type support.

✅ = strong/native support ⚠️ = possible/limited or engine-specific ❌ = not supported (use object storage/another service)

Relational & Analytics (SQL)

Service	Non-technical description	Technical description	Typical use case	Str	Sst	Uns
AWS RDS MySQL / PostgreSQL / SQL Server / Oracle / MariaDB	Managed versions of the classic SQL databases.	PaaS relational engines with backups, patching, Multi-AZ HA.	OLTP apps; software that expects a SQL DB.	✅	⚠️ (JSON in Postgres/MySQL)	❌
Amazon Aurora MySQL / PostgreSQL-compatible	Cloud-optimised SQL with higher performance/HA.	Distributed storage (6-way replication), read replicas, serverless autoscaling.	High-throughput OLTP; microservices backends.	✅	⚠️ (JSON columns)	❌
Azure SQL Database	Managed SQL Server in Azure.	PaaS SQL with elastic pools, serverless compute, built-in HA.	Line-of-business apps; reporting stores.	✅	⚠️ (JSON)	❌
Azure Database for PostgreSQL / MySQL	Managed PostgreSQL/MySQL.	HA “flexible server”; PostgreSQL extensions.	Modern app stacks needing managed OSS SQL.	✅	⚠️ (JSON/JSONB; JSON)	❌
GCP Cloud SQL PostgreSQL / MySQL / SQL Server	Google’s managed SQL trio.	Managed instances, replicas, automated ops.	Web/mobile backends; small–mid OLTP.	✅	⚠️	❌
Google Cloud Spanner	“SQL that scales globally.”	Horizontally scalable, strongly consistent distributed SQL; ANSI SQL, transactions, JSON type.	Global OLTP (fintech, gaming, SaaS).	✅	⚠️ (JSON)	❌
Amazon Redshift	AWS data warehouse.	MPP columnar SQL engine; SUPER type for semi-structured.	Enterprise BI; ELT at scale.	✅	✅	❌
Azure Synapse Dedicated SQL Pool	Azure data warehouse.	MPP columnar SQL; PolyBase/Copy for ingestion.	Enterprise BI on Azure.	✅	⚠️ (OPENJSON/PolyBase)	❌
Google BigQuery	Serverless analytics warehouse.	Columnar, ANSI SQL, massive parallelism; native JSON/ARRAY; external tables.	Ad-hoc analytics, ELT, ML-on-SQL.	✅	✅	⚠️ (via external tables/GCS)
Snowflake Multi-cloud	Cloud data platform for analytics.	Elastic compute/storage; VARIANT for semi-structured; stages/external access.	Unified warehouse + data sharing.	✅	✅ (JSON/Parquet/Avro/XML)	⚠️ (files via stages)

NoSQL (Key-value, Document, Wide-column)

Service	Non-technical description	Technical description	Typical use case	Str	Sst	Uns
Amazon DynamoDB	Serverless key-value/document store that never blinks.	Partitioned KV/JSON docs, single-digit-ms latency, auto scaling, streams.	High-scale apps, session/carts, IoT.	⚠️	✅	❌ (binaries ≤400KB)
Azure Cosmos DB	Global multi-model NoSQL.	APIs: Core (SQL/doc), Mongo, Cassandra, Gremlin, Table; multi-region, tunable consistency.	Low-latency global apps, catalogs.	⚠️	✅	❌
Google Firestore	Serverless document database.	Hierarchical JSON docs; ACID at doc level; real-time listeners.	Mobile/web apps, profiles, settings.	⚠️	✅	❌
Google Bigtable	Massive time-series/wide-column store.	Sparse, distributed HBase-compatible database.	IoT/telemetry, ad tech, TS data.	⚠️	✅	❌
Amazon Keyspaces Cassandra	Managed Cassandra.	Serverless Cassandra-compatible wide-column store.	Time-series, high-write workloads.	⚠️	✅	❌
MongoDB Atlas Multi-cloud	Managed MongoDB.	Document model, flexible schema, ACID at doc level; GridFS for files.	Content/user data, catalogs.	⚠️	✅	⚠️ (via GridFS)

Graph, Search & Log/Telemetry (Specialised)

Service	Non-technical description	Technical description	Typical use case	Str	Sst	Uns
Amazon Neptune	Managed graph database.	Property graph (Gremlin) & RDF (SPARQL), ACID.	Knowledge graphs, recommendations.	⚠️	✅	❌
Cosmos DB (Gremlin API)	Graph on Cosmos.	Gremlin traversal on a distributed store.	Social networks, network topology.	⚠️	✅	❌
Amazon OpenSearch Service	Managed search & analytics.	Lucene-based inverted index; JSON docs; full-text search + aggregations.	Log analytics, app/site search.	⚠️	✅	✅ (indexes text; binaries in S3)
Azure Data Explorer (Kusto)	Fast log/time-series analytics.	Columnar engine with KQL; semi-structured ingestion.	Telemetry, security analytics.	⚠️	✅	❌

Tip: store large binaries (images, PDFs, media) in cloud object storage (e.g., S3, Azure Blob, Google Cloud Storage) and reference them from your database.

Main Cloud Data Pipelines — Quick Reference

A practical map of batch/ELT, streaming/CDC, and orchestration options. Each row includes a plain-English summary, a technical note, a common use case, and data-type support.

✅ = native/strong support ⚠️ = possible/with caveats ❌ = not a fit

Batch / ELT Pipelines

Service	Non-technical description	Technical description	Typical use case	Str	Sst	Uns
AWS Glue	Managed ETL to load and transform data.	Serverless Spark jobs, crawlers, Data Catalog, notebooks; job bookmarks.	Batch ingest + transforms to S3/Redshift/Lakehouse.	✅	✅ (JSON/Parquet/Avro)	⚠️ (UDFs/pass-through)
Azure Data Factory incl. Synapse Pipelines	Azure’s GUI pipelines for copy and transform.	100+ connectors, Mapping Data Flows (Spark), triggers, CI/CD.	Lift-and-shift ETL/ELT to ADLS/Synapse/Snowflake.	✅	✅	⚠️ (copy/metadata)
Google Cloud Dataflow	Google’s managed data processing service.	Fully managed Apache Beam runners (batch & streaming), autoscaling workers.	ELT/ETL to BigQuery/Cloud Storage.	✅	✅	✅ (custom DoFns)
Google Cloud Dataproc	Managed Spark/Hadoop clusters.	Ephemeral or long-running Spark, Hive, Hadoop; JARs and notebooks.	Modernise legacy ETL; Spark jobs at scale.	✅	✅	⚠️
Databricks Delta Live Tables (DLT)	Declarative pipelines on a lakehouse.	Managed Spark/Delta; expectations (DQ), CDC, lineage; Bronze→Silver→Gold.	Lakehouse ELT with quality rules.	✅	✅	⚠️ (file extract/ML)
Snowflake Snowpipe / Tasks	Continuous or batch loads into Snowflake.	Event-driven ingest from object storage; Streams/Tasks for ELT orchestration.	Near-real-time file ingest and transforms in-warehouse.	✅	✅ (VARIANT)	⚠️ (files via stages)
Fivetran	Point-and-click SaaS ELT.	Managed connectors for DBs/SaaS; auto schema evolution to DW/Lake.	Rapid onboarding to Snowflake/BigQuery/Redshift.	✅	✅	❌
Matillion	Visual ELT for cloud warehouses.	Push-down SQL to Snowflake/Redshift/BigQuery; orchestration and components.	Team-owned ELT with versioning.	✅	✅	❌
Informatica IICS	Enterprise integration in the cloud.	Mappings, CDC, data quality, MDM tie-ins, governance.	Regulated/complex estates and hybrid integration.	✅	✅	⚠️ (adapters/custom)
Airbyte Managed or OSS	Open-source ELT connectors.	Connector SDK, CDC support, sync to lakes/warehouses.	Cost-effective ELT and custom sources.	✅	✅	❌

Streaming / CDC (Near-real-time)

Service	Non-technical description	Technical description	Typical use case	Str	Sst	Uns
Amazon Kinesis Streams & Firehose	Real-time pipes on AWS.	Streams for ingestion; Firehose buffers and delivers to S3/Redshift/OpenSearch.	Clickstreams, IoT telemetry, app events.	✅	✅	⚠️ (binary pass-through)
Amazon MSK Managed Kafka	Kafka as a managed service.	Managed brokers, IAM/VPC, Kafka Connect integrations.	High-throughput event backbone.	✅	✅	⚠️
AWS DMS Database Migration Service	Continuous DB replication.	CDC from relational sources to S3/Kinesis/Redshift/other DBs.	Legacy→cloud replication and cutovers.	✅	⚠️	❌
Azure Event Hubs	Azure’s big event pipe.	Partitioned event ingestion with low latency; Kafka-compatible endpoint.	Telemetry, logs, stream ingestion.	✅	✅	⚠️
Azure Stream Analytics	SQL-like stream processing.	Windowing joins/aggregations; outputs to ADLS/SQL/Power BI.	Real-time dashboards and anomaly detection.	✅	✅	❌
Google Pub/Sub	Google’s global event bus.	Exactly-once options, push/pull, ordered keys; integrates with Dataflow.	Event ingestion for Dataflow/BigQuery.	✅	✅	⚠️
Google Datastream	Serverless CDC on GCP.	Change data capture from DBs to BigQuery/Cloud Storage.	Low-ops CDC pipelines and cutovers.	✅	⚠️	❌
Confluent Cloud Kafka + Connect + ksqlDB	Fully managed Kafka across clouds.	Kafka core with managed Connect, Schema Registry, ksqlDB.	Cross-cloud streaming backbone.	✅	✅	⚠️

Orchestration / Workflow (Run the pipelines)

Service	Non-technical description	Technical description	Typical use case	Str	Sst	Uns
AWS MWAA Managed Airflow	Airflow without the ops.	Managed schedulers/workers; DAGs as code; AWS integrations.	Complex dependency DAGs on AWS.	✅	✅	⚠️ (move/process files)
Google Cloud Composer Managed Airflow	GCP’s managed Airflow.	GKE-based Airflow with GCP hooks and integrations.	Orchestrate Dataflow/BigQuery/Dataproc.	✅	✅	⚠️
Azure Data Factory as orchestrator	Triggers and pipelines to coordinate jobs.	Time- or event-based triggers, dependencies, retries, self-hosted runtimes.	End-to-end Azure data workflows.	✅	✅	⚠️
AWS Step Functions	Serverless workflow engine.	State machines for ETL, retries, parallelism; integrates with Glue/Lambda.	Glue/Spark jobs + Lambdas orchestration.	✅	✅	⚠️
Databricks Jobs	Schedule and run lakehouse jobs.	Task graphs, cluster policies, DLT integration, Git ops.	Operationalise notebooks/SQL/ML on Databricks.	✅	✅	⚠️

Tip: Large binaries (images, PDFs, media) belong in cloud object storage (S3, Azure Blob, Google Cloud Storage); your pipelines can reference or transform them as needed.

Main Cloud Compute Options — Quick Reference

A practical map of compute families (VMs, containers, serverless, batch/HPC, big data/stream, ML). Each row includes a plain-English summary, a technical note, a common use case, and data-type handling.

✅ = strong/native fit ⚠️ = possible/with caveats ❌ = not a fit

Virtual Machines (VMs)

Service	Non-technical description	Technical description	Typical use case	Str	Sst	Uns
AWS EC2	Raw virtual servers you configure as you like.	Wide instance families (CPU/GPU/ARM), Auto Scaling, Spot, placement groups, custom AMIs.	Legacy apps, custom stacks, high control/observability.	✅	✅	✅
Azure Virtual Machines	Microsoft’s managed virtual servers.	VM Scale Sets, Hybrid benefits, proximity placement, Windows/Linux images.	Windows-heavy estates, hybrid lift-and-shift.	✅	✅	✅
Google Compute Engine	Google’s on-demand VMs.	Custom machine types, preemptible VMs, live migration, sole-tenant nodes.	Custom runtimes, cost-tuned fleets, HPC baselines.	✅	✅	✅

Containers & Kubernetes

Service	Non-technical description	Technical description	Typical use case	Str	Sst	Uns
Managed Kubernetes EKS / AKS / GKE	Kubernetes clusters without the control-plane pain.	Managed API servers, node pools, autoscaling, add-ons (Ingress, CSI, CNI), GPU pools.	Microservices, data/ML platforms, multi-tenant apps.	✅	✅	✅
Serverless Containers Cloud Run / Azure Container Apps	Run a container from zero to scale without managing servers.	Per-request autoscale to zero, HTTP/async triggers, revisions, simple networking.	APIs, event workers, lightweight ETL/ML inference.	✅	✅	⚠️ (short-lived; external storage)
Elastic Container Service ECS / Fargate	Orchestrated containers on AWS; Fargate removes servers.	Task/Service model, service discovery, IAM, capacity providers; Fargate = serverless execution.	Batch jobs, APIs, back-office workers.	✅	✅	✅

Serverless Functions (FaaS)

Service	Non-technical description	Technical description	Typical use case	Str	Sst	Uns
AWS Lambda	Run code on events; no servers to manage.	Event triggers (API, S3, Kafka), ephemeral runtime, concurrency scaling, extensions.	Event processing, light ETL, API backends.	✅	✅	⚠️ (timeout/memory limits)
Azure Functions	Event-driven functions on Azure.	Bindings (HTTP/Queue/Blob/Cosmos), Durable Functions, consumption/premium plans.	Workflows, integrations, reactive tasks.	✅	✅	⚠️
Google Cloud Functions	Functions as a service on GCP.	Gen 2 on Cloud Run; triggers via Pub/Sub/Storage/HTTP, autoscale.	Event glue, small transforms, webhooks.	✅	✅	⚠️

PaaS App Platforms

Service	Non-technical description	Technical description	Typical use case	Str	Sst	Uns
Azure App Service	Deploy web apps and APIs without servers.	Managed runtime (Windows/Linux), slots, autoscale, VNet integration.	Corporate web apps, APIs, portals.	✅	✅	⚠️ (external storage/CDN)
Google App Engine	Google’s original PaaS for apps.	Standard/Flexible environments, autoscale, built-in logging, services/versions.	Multi-service web apps, rapid prototypes.	✅	✅	⚠️
AWS Elastic Beanstalk	Upload app → platform handles the rest.	Managed provisioning of EC2/ALB/ASG, health checks, rolling updates.	12-factor apps, quick lifts to AWS PaaS.	✅	✅	⚠️

Batch & HPC

Service	Non-technical description	Technical description	Typical use case	Str	Sst	Uns
AWS Batch	Queue up container jobs at scale, pay per compute.	Managed job queues/compute envs, Spot integration, array jobs, GPU support.	Rendering, science/engineering, nightly crunches.	✅	✅	✅
Azure Batch	Large-scale scheduled compute on Azure.	Pool/Job/Task model, auto-scale pools, low-priority VMs, container support.	Simulation, ETL batches, media processing.	✅	✅	✅
Google Cloud Batch	Fully managed batch job service.	Autoscaled fleets, preemptible VMs, GPU/TPU options, regional queues.	Video/ML preprocessing, parameter sweeps.	✅	✅	✅

Big Data & Stream Processing

Service	Non-technical description	Technical description	Typical use case	Str	Sst	Uns
Amazon EMR	Managed Spark/Hadoop for big data.	EMR on EC2/EKS, autoscaling, Spot, HDFS/S3 storage, many runtimes (Spark/Hive/Presto).	ETL at scale, lake processing, ML feature builds.	✅	✅	✅
Google Dataproc	Spark/Hadoop on GCP with fast startup.	Ephemeral clusters, autoscaling, component gateway; GCS/BQ integrations.	Budget-friendly Spark jobs, modernised ETL.	✅	✅	✅
Azure Synapse Spark	Apache Spark inside Synapse.	Serverless/pooled Spark, notebooks, Delta/Parquet, integrated pipelines.	Lakehouse transforms, notebooks, data exploration.	✅	✅	✅
Databricks Multi-cloud	Lakehouse compute for data + AI.	Managed Spark/Photon, Delta Lake, DLT, MLflow; jobs & clusters.	ELT/ML/AI platforms, collaborative notebooks.	✅	✅	✅
Stream Processing Kinesis Data Analytics / Dataflow / Stream Analytics	Real-time compute over event streams.	Apache Flink (KDA), Apache Beam (Dataflow), SQL windows (ASA); stateful operators.	Real-time ETL, anomaly detection, dashboards.	✅	✅	⚠️ (typically references objects)

ML / AI Managed Compute

Service	Non-technical description	Technical description	Typical use case	Str	Sst	Uns
AWS SageMaker	Managed ML training and hosting.	Studio, training jobs, endpoints, pipelines; GPU/CPU; auto scaling; data connectors.	Model training, batch/real-time inference.	✅	✅	✅ (images/audio/text via libs)
Azure Machine Learning	Azure’s end-to-end ML platform.	Designer/SDK, pipelines, managed endpoints, AutoML, AKS/ACI integration.	Enterprise ML ops, governed deployment.	✅	✅	✅
Google Vertex AI	Unified ML/AI workbench on GCP.	Workbench, AutoML/Custom, pipelines, endpoints, TPU/GPU support.	Vision/NLP/tabular ML, scalable inference.	✅	✅	✅

Note: “Support” here means the compute is a good fit to process that data type. Large binaries usually live in object storage (S3/Blob/GCS) and are processed from there.

Main Cloud Storage Options — Quick Reference

Core storage services across clouds. Each row includes a plain-English summary, a technical note, a common use case, and whether it suits Str (Structured), Sst (Semi-structured), and Uns (Unstructured) data.

✅ = strong/native fit ⚠️ = possible/with caveats ❌ = not a fit

Object Storage (Data Lakes)

Service	Non-technical description	Technical description	Typical use case	Str	Sst	Uns
Amazon S3	Durable, low-cost storage for any files.	Object store; 11×9 durability; lifecycle to Standard-IA/Glacier; events; S3 Select.	Data lakes, backups, analytics staging, media.	✅ (CSV/Parquet files)	✅ (JSON/Avro/Parquet)	✅ (images/video/PDFs)
Azure Blob Storage incl. ADLS Gen2	Azure’s universal bucket for files.	Hot/Cool/Archive tiers; ADLS Gen2 adds hierarchical namespace & ACLs.	Lakes on ADLS, archival, ML/analytics landing zones.	✅	✅	✅
Google Cloud Storage (GCS)	Google’s object storage for any data.	Multi/dual-region; lifecycle; notifications; Autoclass tiering.	Data lakes, media libraries, BQ external tables.	✅	✅	✅
Microsoft OneLake Fabric	Unified data lake for Fabric workspaces.	ADLS Gen2 under the hood; shortcuts; item-level governance.	Enterprise lake tightly integrated with Fabric/Power BI.	✅	✅	✅

File Storage (NFS / SMB)

Service	Non-technical description	Technical description	Typical use case	Str	Sst	Uns
Amazon EFS	Shared Linux file system.	NFSv4.1, elastic scale, multi-AZ, burst/perf modes.	Lifted apps needing POSIX; containers; user home dirs.	✅ (files)	✅	✅
Amazon FSx Windows / Lustre / NetApp	Managed Windows or high-speed file systems.	SMB (Windows), POSIX high-throughput (Lustre), ONTAP features (snap/clone).	Windows shares, HPC scratch, enterprise NAS offload.	✅	✅	✅
Azure Files + Azure NetApp Files	Managed SMB/NFS shares.	AD integration, performance tiers; NetApp = ultra-low latency.	Shared drives, VDI profiles, SAP app shares.	✅	✅	✅
Google Filestore	Managed NFS for GCP.	POSIX-compliant NFS with zonal/regional tiers.	GKE shared volumes, content repos, media workflows.	✅	✅	✅

Block Storage (Attach to VMs/Containers)

Service	Non-technical description	Technical description	Typical use case	Str	Sst	Uns
AWS EBS	Disks for EC2.	SSD/HDD tiers; provisioned IOPS; snapshots; encryption; Multi-Attach (io2).	Databases, low-latency apps, boot volumes.	✅ (DB files)	✅	✅
Azure Managed Disks	Disks for Azure VMs.	Premium/Ultra SSD; ZRS options; snapshots; shared disks.	SAP/app servers; high-IOPS databases.	✅	✅	✅
Google Persistent Disk + Local SSD	Disks for Compute Engine/GKE.	Balanced/SSD/HDD; regional PD; snapshots; Local SSD for ultra-low latency.	Relational DBs, stateful services, caches.	✅	✅	✅

Lakehouse Table Formats (on Object Storage)

Service/Format	Non-technical description	Technical description	Typical use case	Str	Sst	Uns
Delta Lake Databricks/Open	Tables on your lake with ACID.	Transaction log + Parquet; time travel; upserts/merges; schema evolution.	Reliable lakehouse ELT; CDC merges; analytics.	✅	✅ (JSON in cols)	⚠️ (store files alongside)
Apache Iceberg	Open table format for big lakes.	Snapshot isolation; hidden partitioning; multi-engine (Spark/Trino/…).	Engine-agnostic lakehouse tables & governance.	✅	✅	⚠️
Apache Hudi	Incremental tables on a lake.	COW/MOR storage; record-level updates; indexes; Spark/Presto/Trino.	Near-real-time upserts; incremental pipelines.	✅	✅	⚠️

Archival & Cold Storage

Service	Non-technical description	Technical description	Typical use case	Str	Sst	Uns
Amazon S3 Glacier Instant / Flexible / Deep	Very cheap, slower access storage.	Archive tiers with minutes–hours retrieval; vaults; lifecycle policies.	Long-term retention, compliance copies, DR.	✅ (files)	✅	✅
Azure Archive Storage	Cold tier for blobs.	Low-cost tier with rehydration; immutable options (legal hold/worm).	Backups, legal/regulated archives, rarely accessed data.	✅	✅	✅
Google Coldline / Archive	Low-cost long-term storage.	GCS classes with retrieval SLA/cost trade-offs; lifecycle rules.	Compliance, backups, cold datasets.	✅	✅	✅

Hybrid & Edge Storage (Gateways)

Service	Non-technical description	Technical description	Typical use case	Str	Sst	Uns
AWS Storage Gateway	On-prem file/tape/block that writes to S3.	File/Volume/Tape gateways; caching; secure sync to S3/Glacier.	Hybrid backup, archive to cloud, DR staging.	✅	✅	✅
Azure Data Box / StorSimple	Edge appliances for bulk/hybrid storage.	Offline/online transfer; tiering to Blob; import/export devices.	Datacentre migrations; large dataset seeding.	✅	✅	✅
Google Transfer Appliance / Storage Transfer	Move very large datasets into GCP.	Rugged appliances; scheduled transfers/sync from S3/HTTP/POSIX.	One-off or recurring lake ingestion at scale.	✅	✅	✅

Notes: Object and file stores hold any data as files (great for Sst/Uns). Block storage shines for low-latency databases and app disks. Lakehouse table formats (Delta/Iceberg/Hudi) sit on object storage to bring ACID/table semantics to files.

Databricks Components Overview

🧱 Databricks Components Overview

Component	Non-Technical Description 📘	Technical Description ⚙️	Use Case/Scenario 🎯
Workspace	Your collaborative project space	Hosts notebooks, jobs, repos, and ML experiments	Organising analytics or data science projects
Clusters	Computing engine (like your personal AI but scalable)	Spark-based distributed compute environment (autoscaling or manual)	Run jobs, notebooks, ML models
Jobs	Automated tasks or scheduled workflows	DAG-based execution of notebooks, scripts, or JARs	Nightly ETL jobs, ML training pipelines
Notebooks	Interactive workspace for code and output	Supports Python, SQL, Scala, R, Markdown	Exploratory data analysis, prototyping
SQL Editor	GUI to query data tables	Uses Databricks SQL for BI-friendly interface	Business users querying curated tables
Delta Lake	Like a spreadsheet that remembers everything	ACID-compliant storage layer over Parquet	Reliable data lake tables with versioning
Unity Catalog	Your data’s filing cabinet and bouncer	Central metadata & access control layer with RBAC	Secure multi-tenant access across clouds
Lakehouse Platform	The Databricks "big idea" — warehouse + lake	Combines data lake scalability with DB-like performance	Unified platform for batch, stream, ML, and BI
MLflow	Your model's history, packaging, and delivery	Open-source lifecycle management for ML models	Experiment tracking, model registry, deployment
Repos	Built-in Git versioning	Git-backed source control for notebooks & jobs	Code collaboration and CI/CD
Data Explorer	Browse your tables like folders	Visual UI to inspect catalog, schemas, tables	Data discovery and governance check
Dashboards	Shareable reports and visuals	BI dashboard powered by SQL or notebooks	Stakeholder insights and KPIs

Snowflake Components – Cheat Sheet

Snowflake Components – Field Guide

Non-technical and technical descriptions, real-world scenarios, and common gotchas you actually meet on projects.

Category	Component	Non-technical description	Technical description	Scenario	Common gotchas
Compute	Virtual Warehouses	On/off “engines” that run your queries and loads. Pick a size, pay while it’s on, pause when idle.	MPP compute clusters with independent caches; scale up (bigger) or out (multi-cluster). Auto-suspend/resume; credits billed per-second (min 1 min).	Month-end: scale out to clear queues; auto-suspend overnight.	Leaving warehouses running; oversizing by habit; long auto-suspend (wasted minutes); each warehouse has its own warm cache.
Pipelines	Snowpipe	Auto-loads new files as they land. Think “continual drip-feed” rather than big batches.	Event/REST-triggered micro-batch COPY from a stage; near-real-time ingestion; charges per file/processing.	Landing CSVs from cloud storage every few minutes into a raw table.	Millions of tiny files; incorrect file format options; duplicate handling; object store permissions; expecting true streaming latency.
Pipelines	Snowpipe Streaminglow-latency	Push rows directly into tables without creating files first.	Record-based ingest via SDK/connectors; bypasses staged files; designed for sub-minute latency.	Clickstream or IoT events needing fast availability for dashboards.	Event ordering/duplicates; commit semantics; schema evolution; monitoring cost of continuous ingest.
Change data	Streams	A change tracker: “what changed since last time?”	CDC over base tables/views (insert/update/delete) with consumption offsets; pairs well with Tasks/Dynamic Tables.	Incremental processing from raw → curated without re-scanning full tables.	Not consuming advances; retention windows; DDL changes breaking downstream expectations.
Orchestration	Tasks	Built-in scheduler for jobs. Like cron, but in Snowflake.	Time- or dependency-triggered DAGs; run SQL/SP on a chosen warehouse or serverless; track history.	Nightly dimension refresh after raw ingest completes.	Tasks left suspended; wrong warehouse size; timezone surprises; missing privileges.
Transformation	Dynamic Tables	“Always-fresh” derived tables maintained for you.	Declarative objects with a defining query + freshness target; incremental maintenance and dependency tracking.	Keep a curated customer table in sync from multiple sources without hand-built DAGs.	Assuming they’re free—serverless work still bills; deep chains can hide cost/latency; not a cure-all for messy logic.
Storage	Databases / Schemas / Tables / Views / MVs	Your folders and sheets: organise data, expose it as tables or views; MVs pre-compute results.	Tables (permanent/transient/temp), optional clustering; Views (secure/regular); Materialized Views with refresh costs/limits.	Speed up a gnarly join with a targeted MV; secure a view for consumers.	Using transient in prod (recovery limits); MVs on non-deterministic queries; over-clustering tiny tables.
Lakehouse	External & Iceberg Tables	Query data where it lives in your cloud data lake, no copy required.	External tables over files in S3/GCS/Azure; Iceberg tables integrate open-table formats/catalogs; partition pruning depends on layout/metadata.	Blend Parquet in the lake with internal Snowflake tables for analytics.	Stale metadata; path/partition mismatches; small-file performance; storage IAM misconfig.
Ingest I/O	Stages (internal/external)	Landing zones for files you load from or unload to.	Named internal or external stages with credentials/encryption; directory tables for discovery.	Partners drop files to an external stage you manage.	Credential leakage in URLS; wrong region; case-sensitive paths; forgetting stage privileges.
Ingest I/O	File Formats	Reusable “how to read this file” settings.	Parsing options for CSV/JSON/Avro/Parquet/ORC/XML; referenced by COPY/Snowpipe/External Tables.	Standardise CSV quirks (nulls, quotes) across ingest jobs.	NULL vs empty strings; date/locale mismatches; hidden BOM/encoding issues.
Governance	Masking Policies	Hide sensitive values dynamically based on who’s asking.	Policy-based column masking evaluated at query time; role/context aware; auditable.	Show last-4 of cards to support, full value to finance.	Applying to complex types; downstream tools assuming unmasked types; forgetting UNMASK-level access for admins.
Governance	Row Access Policies	Only the rows you’re allowed to see, nothing more.	Predicate functions enforce row-level security on tables/views.	Country managers see only their region’s data by role.	Over-complex predicates; surprises when combined with filters; diagnosing “missing rows”.
Cost control	Resource Monitors	Spend tripwires to stop runaway compute.	Credit thresholds with actions (notify/suspend) scoped to warehouses.	Cap a dev warehouse to prevent accidental 24/7 spend.	Only covers warehouses; serverless usage needs separate observation/alerts; period/timezone misunderstandings.
Metadata	Information Schema & Account Usage	System tables you can query for lineage-ish insight, usage, and health.	Database-scoped Information Schema and account-wide views with ingestion latency; ideal for monitoring/reporting.	Build a usage dashboard showing query cost by team.	Data freshness lag; privilege gaps; mixing object names from different namespaces.
Dev & Apps	UDF / UDTF / UDAF	Custom functions when SQL alone won’t cut it.	Extend with SQL/JS/Java/Python; UDTF returns tables; sandboxed execution.	Custom normaliser for messy product codes as a UDF.	Performance of row-by-row logic; library limits; cold-start penalties for some runtimes.
Dev & Apps	Stored Procedures	Procedural scripts with variables and control flow.	Run in JS/SQL/Java/Python; can call SQL, manage transactions, and orchestrate tasks.	Automate schema rollout and grant routines for new projects.	Long-running work timing out; debugging ergonomics; executing on the wrong warehouse.
Dev & Apps	Snowpark APIs	Write code (DataFrames) that runs close to the data.	Python/Scala/Java APIs with pushdown; UDF/UDTF authoring; package management via curated channels.	Data-prep pipelines in Python without leaving Snowflake.	Accidental client-side collects; serialization limits; dependency/version pinning.
Dev & Apps	Snowpark Container Services	Run your containers next to your data—ML services, custom apps.	Managed container runtime integrated with Snowflake auth/networking; supports services and batch jobs.	Serve an in-house ML model via a low-latency API within your Snowflake account.	Oversized images; egress/network rules; security approvals; cost of always-on services.
Collaboration	Secure Data Sharing	Share live data without copying or FTP drama.	Provider/Consumer accounts with shared objects; no data duplication; governed access.	Give a supplier read-only access to sales without exporting files.	Schema changes breaking consumers; accidentally sharing sensitive columns; region/cloud compatibility.
Collaboration	Listings & Marketplace	An “app store” for data and apps—public or private listings.	Package data/apps with terms and versioning; distribute across orgs/regions/clouds.	Monetise an industry dataset to partners via private listings.	Legal/contracting lag; unclear update cadence; consumer entitlements drifting from expectations.

Tip: pair this with a one-page “operating rules” note—suspend defaults, file-size targets, freshness SLAs, and a short naming convention.

All Dropdowns Closed on Load

Dropdowns (Details) — Always Closed on Page Load

This page ensures <details> elements (dropdowns) are closed whenever the page is opened or refreshed, and also when restored from the back/forward cache.

Example Section A (starts open in markup)

Even though this has open in the HTML, the script will close it on load.

Example Section B

Regular closed state; the script also keeps it closed on initial load.

Example Section C

Another one that starts open in the markup and will be closed by the script.

Top 20 Cloud Databases — Comparison

Purpose, plain-English & technical descriptions, use cases, pros/cons, and a scoring heat‑map (1–5: higher is better).

Scale Latency Cost Predictability Ecosystem Ops Effort Heat cells fill left→right (1=20%, 5=100%).

Database	Purpose	Non‑Technical Description	Technical Description	Primary Use Cases	Pros	Cons	Scale	Latency	Cost Predictability	Ecosystem	Ops Effort
Amazon DynamoDB	Serverless key‑value/document store	Massively scalable app data store for simple key lookups and flexible documents.	Fully managed NoSQL; consistent hashing partitions; adaptive capacity; streams; global tables.	High‑traffic web/mobile backends, IoT, gaming sessions, shopping carts.	Serverless scale; global tables; strong SLAs.	Query patterns limited; hot partition pitfalls; complex cost tuning.	5	4	3	4	5
Google Cloud Spanner	Globally distributed relational DB	SQL database that scales across regions while keeping transactions consistent.	Distributed SQL with MVCC and TrueTime for external consistency; ANSI SQL; strong schemas.	Financial systems, inventory, multi‑region SaaS needing strong consistency.	Horizontal scale + SQL + transactions; multi‑region.	Higher cost; needs careful schema; tuning differs from single‑node RDBMS.	5	4	3	4	4
Azure Cosmos DB	Multi‑model NoSQL (key‑value/doc/graph/column)	Low‑latency database with global distribution and flexible data models.	APIs for Core (SQL), MongoDB, Cassandra, Gremlin, Table; RU‑based throughput; multi‑region.	Global apps, personalization, IoT, event stores.	Global distribution; multi‑API; low latency at p99.	RU sizing can be tricky; cross‑partition queries can be costly.	5	4	3	4	4
Amazon Aurora (MySQL/Postgres)	High‑performance relational (managed)	MySQL/Postgres‑compatible engine with better performance and failover.	Decoupled compute/storage; 6‑way replication; read replicas; serverless v2 options.	OLTP apps, SaaS backends, migrations from on‑prem RDBMS.	Drop‑in compatibility; strong HA; autoscaling options.	Region‑bound scaling; heavy writes may need sharding.	4	4	4	5	4
Amazon Redshift	Cloud data warehouse	Columnar SQL warehouse for analytics at scale.	MPP, columnar storage, RA3 managed storage, Spectrum external tables; materialized views.	Enterprise BI, ELT analytics, semi‑structured via SUPER.	Mature ecosystem; performance features; concurrency scaling.	Cluster sizing decisions; Spectrum governance; workload mgmt.	4	3	4	5	4
Google BigQuery	Serverless data warehouse	Analytics engine where you just run SQL and pay per query—no clusters to manage.	Dremel‑based columnar engine; separation of storage/compute; BI Engine caches.	Ad‑hoc analytics, ELT at scale, ML‑in‑warehouse, log analytics.	Near‑zero ops; great price/perf for bursty workloads.	Cost predictability lower with ad‑hoc users; quotas/limits.	5	2	3	5	5
Snowflake	Cloud data platform/warehouse	Elastic SQL warehouse with easy scaling and cross‑cloud support.	Decoupled compute/storage; virtual warehouses; time travel; data sharing; native apps.	Enterprise analytics, data sharing, multi‑tenant analytics products.	Strong ecosystem; easy scaling; governance features.	Credit creep if unmanaged; proprietary features increase lock‑in.	5	3	3	5	5
Databricks SQL (Delta Lakehouse)	Lakehouse SQL engine	SQL over Delta Lake on object storage—warehouse performance with lake flexibility.	Photon engine; Delta tables with ACID; Unity Catalog; serverless SQL warehouse.	BI on data lake, ELT at scale, medallion architectures.	Open formats; strong with streaming + ML adjacent.	Tuning required for small, chatty queries; cost‑by‑concurrency.	5	3	3	5	4
Azure SQL Database	Managed relational (SQL Server)	SQL Server as a service—familiar T‑SQL with built‑in HA and backups.	Single database/elastic pools; Hyperscale; automatic tuning; AAD integration.	Line‑of‑business apps, SaaS multi‑tenant, reporting stores.	Rich SQL features; easy Azure integration; predictable tiers.	Vertical scaling limits vs distributed SQL; DTU confusion for newcomers.	4	4	4	5	5
Google AlloyDB for PostgreSQL	High‑performance Postgres	Postgres‑compatible with faster analytics and OLTP, fully managed.	Disaggregated storage; columnar engine for analytics; automatic failover.	OLTP plus HTAP‑ish patterns, modern app backends.	Great Postgres perf; minimal ops; analytics boosts.	GCP‑only; migration from other engines needed.	4	4	4	4	5
MongoDB Atlas	Managed document database	Flexible JSON document store with global clusters and rich developer tooling.	Replica sets, sharding, multi‑cloud; Atlas Search; triggers; Realm/App Services.	Content, catalogs, user profiles, event data.	Developer‑friendly; flexible schema; strong tools.	Cross‑document transactions limited; joins not native; costs scale with usage.	4	4	3	5	5
DataStax Astra DB (Cassandra)	Managed wide‑column (Cassandra)	Cassandra as a service for write‑heavy, always‑on workloads.	Masterless ring; tunable consistency; Stargate APIs (CQL/REST/GraphQL).	IoT telemetry, messaging, time‑series with high ingest.	Linearly scalable writes; global availability; APIs.	Query flexibility limited; model by access pattern.	5	4	3	4	5
Google Cloud Bigtable	Managed wide‑column (HBase‑like)	Single‑digit ms key‑value at petabyte scale—great for time‑series and personalization.	Sparse, distributed row store; SSD/HDD nodes; GC policies.	Time‑series, ad tech, personalization features, IoT.	Huge scale; predictable low latency if modeled right.	Single index (row key) mindset; no joins/aggregations.	5	4	3	4	4
Redis Enterprise Cloud	In‑memory data store	Super‑fast cache/DB for microsecond reads/writes, with JSON and search options.	Redis with clustering, persistence, modules (JSON, Search, Bloom, TimeSeries).	Caching, session stores, leaderboards, real‑time features.	Ultra‑low latency; versatile modules; enterprise HA.	Memory cost; persistence/consistency trade‑offs.	4	5	3	5	5
Elastic Cloud (Elasticsearch)	Search & analytics engine	Free‑text search and log analytics with dashboards (Kibana).	Inverted indexes, distributed shards/replicas; aggregations; ILM; vector search.	Search‑heavy apps, observability, security analytics.	Great search features; broad ecosystem; Kibana visualisation.	Query costs for wide scans; ops complexity for hot‑warm tiers.	4	3	3	5	4
InfluxDB Cloud	Time‑series database	Purpose‑built for metrics and events over time with a fluent query language.	TSM/TSI storage, downsampling/retention; Flux/SQL interfaces; serverless options.	IoT metrics, monitoring, SRE/DevOps telemetry.	Time‑series ergonomics; tasks for downsampling.	Flux is niche (SQL emerging); cardinality pitfalls.	4	4	3	4	5
Timescale Cloud (TimescaleDB)	Time‑series on PostgreSQL	Postgres with time‑series extensions—nice when you want SQL + time‑series.	Hypertables, compression, continuous aggregates; full SQL & Postgres ecosystem.	Industrial IoT, finance ticks, app metrics needing joins/SQL.	Full SQL power; easy analytics joins; good compression.	Not for ultra‑high ingest vs Bigtable/Cassandra tiers.	4	4	4	5	5
CockroachDB (Managed/Dedicated)	Distributed SQL (Postgres wire)	Resilient SQL that scales horizontally with strong consistency.	Raft consensus, range‑based data distribution; Postgres wire compatibility.	Resilient SaaS backends, geo‑partitioned data, OLTP scale‑out.	Survivable regions; SQL + transactions; online scale.	Hot ranges if keys skew; some Postgres features differ.	5	4	4	4	4
PlanetScale	Serverless MySQL (Vitess)	MySQL that can shard/scale underneath without changing app code.	Vitess control plane; branching; online schema changes; connection pooling.	Prod MySQL for SaaS; branching for safe changes; scale‑out reads.	Zero‑downtime schema changes; developer workflow wins.	MySQL‑only; some features limited by Vitess layer.	4	4	4	4	5
Neo4j AuraDB	Managed graph database	Stores data as nodes and relationships—great for connected queries.	Property graph model; Cypher query language; native graph engines.	Fraud rings, recommendations, knowledge graphs, network analysis.	Expressive relationship queries; fast traversals.	Not ideal for big aggregations; different modeling mindset.	3	3	4	4	5
Amazon RDS (PostgreSQL/MySQL)	Managed relational (classic)	Familiar relational databases without the patching and backups.	Managed instances, Multi‑AZ, read replicas, storage autoscaling.	Traditional apps, quick lifts from on‑prem, dependable OLTP.	Mature, predictable; wide community knowledge.	Instance‑bound scaling; manual sharding for big growth.	4	4	4	5	4
Azure Database for PostgreSQL	Managed Postgres (Flexible Server)	PostgreSQL in Azure with managed HA and scaling.	Flexible Server, autoscale, zone‑redundant HA, pg_extensions support.	Modern app backends needing Postgres features and Azure integration.	Strong AAD/Key Vault integration; familiar tooling.	Vertical scaling limits vs distributed SQL; regional.	4	4	4	4	5
Google Cloud SQL	Managed MySQL/Postgres/SQL Server	Managed relational instances with backups and replicas handled for you.	HA configurations, read replicas, private services connect.	Standard OLTP apps on GCP, quick migrations.	Straightforward; integrates with GCP IAM & VPC.	Instance‑bound; not for huge scale‑out by itself.	3	4	4	4	5
Azure Synapse Dedicated SQL Pool	MPP warehouse (Azure)	Azure’s classic MPP warehouse for large-scale BI with predictable capacity.	Distributed compute, PolyBase, materialized views; workload isolation.	Enterprise BI, predictable SLAs, integrated Azure stack.	Predictable capacity; strong Azure integration.	Cluster ops vs serverless engines; less elastic than lakehouse.	4	3	4	4	4
Amazon OpenSearch Service	Managed search/observability	Search and log analytics compatible with Elasticsearch APIs.	Shard/replica management, UltraWarm/Cold storage, OpenSearch Dashboards.	Search features, observability stacks on AWS.	Good AWS integration; familiar APIs.	Ops tuning for tiers/ILM; query costs for wide scans.	4	3	3	4	4
Azure Cosmos DB for MongoDB	Mongo API on Cosmos	Mongo‑compatible API on Cosmos for global distribution with low latency.	RU/s throughput model; multi‑master write regions; Mongo wire protocol compatibility.	Global user data, catalogs, content stores needing Mongo semantics.	Global replication; automatic indexing; serverless option.	RU planning; cross‑partition costs; feature parity varies by version.	5	4	3	4	4
Azure Database for MySQL	Managed MySQL (Flexible Server)	MySQL managed in Azure with HA and scaling, minus the babysitting.	Flexible Server, zone‑redundant HA, Param tuning, Azure Monitor.	Web apps, CMS, e‑commerce stacks on Azure.	Familiar engine; Azure security integrations.	Vertical scaling limits; regional.	3	4	4	4	5
Firestore (Google Cloud)	Serverless document DB	Serverless JSON store with offline sync for mobile/web apps.	Hierarchical documents/collections, real‑time listeners, strong security rules.	Mobile/web backends, real‑time presence/chat, small‑team apps.	Near‑zero ops; great SDKs; realtime updates.	Query constraints; cost spikes with chatty patterns.	4	4	3	4	5
SingleStoreDB Cloud	Distributed SQL + vectors	Fast SQL store for mixed OLTP/OLAP with vector search features.	Shared‑nothing distributed engine, columnstore + rowstore, pipelines.	Real‑time analytics, operational reporting, AI features with vectors.	Strong mixed‑workload performance; HTAP‑style design.	Vendor‑specific features; sizing still matters.	4	4	3	4	4
TiDB Cloud	Distributed MySQL‑compatible	MySQL‑compatible database that scales horizontally with strong consistency.	TiKV (key‑value store) + TiDB SQL layer; Raft consensus; HTAP with TiFlash.	Scale‑out OLTP with MySQL compatibility; HTAP patterns.	Horizontal scale + SQL; HTAP via TiFlash.	Operational tuning for balance; ecosystem smaller than MySQL/Aurora.	5	4	3	3	4

Scoring (1–5, higher is better): Scale = max horizontal/elastic capacity; Latency = low‑latency suitability; Cost Predictability = ease of forecasting monthly costs; Ecosystem = connectors & tooling; Ops Effort = how easy it is to run.

Quick picks by use case:

Global OLTP with strong consistency: Cloud Spanner, CockroachDB
Write‑heavy time‑series/telemetry: DataStax Astra (Cassandra), Bigtable, InfluxDB
Elastic analytics (serverless): BigQuery, Snowflake, Databricks SQL
Relational with low ops in Azure: Azure SQL DB, AlloyDB(Postgres on GCP analogue), Azure Postgres
Document‑first apps: MongoDB Atlas, Cosmos DB
Search/Observability: Elastic Cloud, OpenSearch
In‑memory latency: Redis Enterprise Cloud
MySQL at scale without drama: PlanetScale, TiDB Cloud