Main Cloud Databases — Quick Reference

A concise map of leading cloud databases with plain-English descriptions, technical notes, common use cases, and data-type support.

✅ = strong/native support ⚠️ = possible/limited or engine-specific ❌ = not supported (use object storage/another service)
Relational & Analytics (SQL)
Service Non-technical description Technical description Typical use case Str Sst Uns
AWS RDS MySQL / PostgreSQL / SQL Server / Oracle / MariaDB Managed versions of the classic SQL databases. PaaS relational engines with backups, patching, Multi-AZ HA. OLTP apps; software that expects a SQL DB. ⚠️ (JSON in Postgres/MySQL)
Amazon Aurora MySQL / PostgreSQL-compatible Cloud-optimised SQL with higher performance/HA. Distributed storage (6-way replication), read replicas, serverless autoscaling. High-throughput OLTP; microservices backends. ⚠️ (JSON columns)
Azure SQL Database Managed SQL Server in Azure. PaaS SQL with elastic pools, serverless compute, built-in HA. Line-of-business apps; reporting stores. ⚠️ (JSON)
Azure Database for PostgreSQL / MySQL Managed PostgreSQL/MySQL. HA “flexible server”; PostgreSQL extensions. Modern app stacks needing managed OSS SQL. ⚠️ (JSON/JSONB; JSON)
GCP Cloud SQL PostgreSQL / MySQL / SQL Server Google’s managed SQL trio. Managed instances, replicas, automated ops. Web/mobile backends; small–mid OLTP. ⚠️
Google Cloud Spanner “SQL that scales globally.” Horizontally scalable, strongly consistent distributed SQL; ANSI SQL, transactions, JSON type. Global OLTP (fintech, gaming, SaaS). ⚠️ (JSON)
Amazon Redshift AWS data warehouse. MPP columnar SQL engine; SUPER type for semi-structured. Enterprise BI; ELT at scale.
Azure Synapse Dedicated SQL Pool Azure data warehouse. MPP columnar SQL; PolyBase/Copy for ingestion. Enterprise BI on Azure. ⚠️ (OPENJSON/PolyBase)
Google BigQuery Serverless analytics warehouse. Columnar, ANSI SQL, massive parallelism; native JSON/ARRAY; external tables. Ad-hoc analytics, ELT, ML-on-SQL. ⚠️ (via external tables/GCS)
Snowflake Multi-cloud Cloud data platform for analytics. Elastic compute/storage; VARIANT for semi-structured; stages/external access. Unified warehouse + data sharing. ✅ (JSON/Parquet/Avro/XML)⚠️ (files via stages)
NoSQL (Key-value, Document, Wide-column)
Service Non-technical description Technical description Typical use case Str Sst Uns
Amazon DynamoDB Serverless key-value/document store that never blinks. Partitioned KV/JSON docs, single-digit-ms latency, auto scaling, streams. High-scale apps, session/carts, IoT. ⚠️❌ (binaries ≤400KB)
Azure Cosmos DB Global multi-model NoSQL. APIs: Core (SQL/doc), Mongo, Cassandra, Gremlin, Table; multi-region, tunable consistency. Low-latency global apps, catalogs. ⚠️
Google Firestore Serverless document database. Hierarchical JSON docs; ACID at doc level; real-time listeners. Mobile/web apps, profiles, settings. ⚠️
Google Bigtable Massive time-series/wide-column store. Sparse, distributed HBase-compatible database. IoT/telemetry, ad tech, TS data. ⚠️
Amazon Keyspaces Cassandra Managed Cassandra. Serverless Cassandra-compatible wide-column store. Time-series, high-write workloads. ⚠️
MongoDB Atlas Multi-cloud Managed MongoDB. Document model, flexible schema, ACID at doc level; GridFS for files. Content/user data, catalogs. ⚠️⚠️ (via GridFS)
Graph, Search & Log/Telemetry (Specialised)
Service Non-technical description Technical description Typical use case Str Sst Uns
Amazon Neptune Managed graph database. Property graph (Gremlin) & RDF (SPARQL), ACID. Knowledge graphs, recommendations. ⚠️
Cosmos DB (Gremlin API) Graph on Cosmos. Gremlin traversal on a distributed store. Social networks, network topology. ⚠️
Amazon OpenSearch Service Managed search & analytics. Lucene-based inverted index; JSON docs; full-text search + aggregations. Log analytics, app/site search. ⚠️✅ (indexes text; binaries in S3)
Azure Data Explorer (Kusto) Fast log/time-series analytics. Columnar engine with KQL; semi-structured ingestion. Telemetry, security analytics. ⚠️

Tip: store large binaries (images, PDFs, media) in cloud object storage (e.g., S3, Azure Blob, Google Cloud Storage) and reference them from your database.

Main Cloud Data Pipelines — Quick Reference

A practical map of batch/ELT, streaming/CDC, and orchestration options. Each row includes a plain-English summary, a technical note, a common use case, and data-type support.

✅ = native/strong support ⚠️ = possible/with caveats ❌ = not a fit
Batch / ELT Pipelines
Service Non-technical description Technical description Typical use case Str Sst Uns
AWS Glue Managed ETL to load and transform data. Serverless Spark jobs, crawlers, Data Catalog, notebooks; job bookmarks. Batch ingest + transforms to S3/Redshift/Lakehouse. ✅ (JSON/Parquet/Avro)⚠️ (UDFs/pass-through)
Azure Data Factory incl. Synapse Pipelines Azure’s GUI pipelines for copy and transform. 100+ connectors, Mapping Data Flows (Spark), triggers, CI/CD. Lift-and-shift ETL/ELT to ADLS/Synapse/Snowflake. ⚠️ (copy/metadata)
Google Cloud Dataflow Google’s managed data processing service. Fully managed Apache Beam runners (batch & streaming), autoscaling workers. ELT/ETL to BigQuery/Cloud Storage. ✅ (custom DoFns)
Google Cloud Dataproc Managed Spark/Hadoop clusters. Ephemeral or long-running Spark, Hive, Hadoop; JARs and notebooks. Modernise legacy ETL; Spark jobs at scale. ⚠️
Databricks Delta Live Tables (DLT) Declarative pipelines on a lakehouse. Managed Spark/Delta; expectations (DQ), CDC, lineage; Bronze→Silver→Gold. Lakehouse ELT with quality rules. ⚠️ (file extract/ML)
Snowflake Snowpipe / Tasks Continuous or batch loads into Snowflake. Event-driven ingest from object storage; Streams/Tasks for ELT orchestration. Near-real-time file ingest and transforms in-warehouse. ✅ (VARIANT)⚠️ (files via stages)
Fivetran Point-and-click SaaS ELT. Managed connectors for DBs/SaaS; auto schema evolution to DW/Lake. Rapid onboarding to Snowflake/BigQuery/Redshift.
Matillion Visual ELT for cloud warehouses. Push-down SQL to Snowflake/Redshift/BigQuery; orchestration and components. Team-owned ELT with versioning.
Informatica IICS Enterprise integration in the cloud. Mappings, CDC, data quality, MDM tie-ins, governance. Regulated/complex estates and hybrid integration. ⚠️ (adapters/custom)
Airbyte Managed or OSS Open-source ELT connectors. Connector SDK, CDC support, sync to lakes/warehouses. Cost-effective ELT and custom sources.
Streaming / CDC (Near-real-time)
Service Non-technical description Technical description Typical use case Str Sst Uns
Amazon Kinesis Streams & Firehose Real-time pipes on AWS. Streams for ingestion; Firehose buffers and delivers to S3/Redshift/OpenSearch. Clickstreams, IoT telemetry, app events. ⚠️ (binary pass-through)
Amazon MSK Managed Kafka Kafka as a managed service. Managed brokers, IAM/VPC, Kafka Connect integrations. High-throughput event backbone. ⚠️
AWS DMS Database Migration Service Continuous DB replication. CDC from relational sources to S3/Kinesis/Redshift/other DBs. Legacy→cloud replication and cutovers. ⚠️
Azure Event Hubs Azure’s big event pipe. Partitioned event ingestion with low latency; Kafka-compatible endpoint. Telemetry, logs, stream ingestion. ⚠️
Azure Stream Analytics SQL-like stream processing. Windowing joins/aggregations; outputs to ADLS/SQL/Power BI. Real-time dashboards and anomaly detection.
Google Pub/Sub Google’s global event bus. Exactly-once options, push/pull, ordered keys; integrates with Dataflow. Event ingestion for Dataflow/BigQuery. ⚠️
Google Datastream Serverless CDC on GCP. Change data capture from DBs to BigQuery/Cloud Storage. Low-ops CDC pipelines and cutovers. ⚠️
Confluent Cloud Kafka + Connect + ksqlDB Fully managed Kafka across clouds. Kafka core with managed Connect, Schema Registry, ksqlDB. Cross-cloud streaming backbone. ⚠️
Orchestration / Workflow (Run the pipelines)
Service Non-technical description Technical description Typical use case Str Sst Uns
AWS MWAA Managed Airflow Airflow without the ops. Managed schedulers/workers; DAGs as code; AWS integrations. Complex dependency DAGs on AWS. ⚠️ (move/process files)
Google Cloud Composer Managed Airflow GCP’s managed Airflow. GKE-based Airflow with GCP hooks and integrations. Orchestrate Dataflow/BigQuery/Dataproc. ⚠️
Azure Data Factory as orchestrator Triggers and pipelines to coordinate jobs. Time- or event-based triggers, dependencies, retries, self-hosted runtimes. End-to-end Azure data workflows. ⚠️
AWS Step Functions Serverless workflow engine. State machines for ETL, retries, parallelism; integrates with Glue/Lambda. Glue/Spark jobs + Lambdas orchestration. ⚠️
Databricks Jobs Schedule and run lakehouse jobs. Task graphs, cluster policies, DLT integration, Git ops. Operationalise notebooks/SQL/ML on Databricks. ⚠️

Tip: Large binaries (images, PDFs, media) belong in cloud object storage (S3, Azure Blob, Google Cloud Storage); your pipelines can reference or transform them as needed.

Main Cloud Compute Options — Quick Reference

A practical map of compute families (VMs, containers, serverless, batch/HPC, big data/stream, ML). Each row includes a plain-English summary, a technical note, a common use case, and data-type handling.

✅ = strong/native fit ⚠️ = possible/with caveats ❌ = not a fit
Virtual Machines (VMs)
Service Non-technical description Technical description Typical use case Str Sst Uns
AWS EC2 Raw virtual servers you configure as you like. Wide instance families (CPU/GPU/ARM), Auto Scaling, Spot, placement groups, custom AMIs. Legacy apps, custom stacks, high control/observability.
Azure Virtual Machines Microsoft’s managed virtual servers. VM Scale Sets, Hybrid benefits, proximity placement, Windows/Linux images. Windows-heavy estates, hybrid lift-and-shift.
Google Compute Engine Google’s on-demand VMs. Custom machine types, preemptible VMs, live migration, sole-tenant nodes. Custom runtimes, cost-tuned fleets, HPC baselines.
Containers & Kubernetes
Service Non-technical description Technical description Typical use case Str Sst Uns
Managed Kubernetes EKS / AKS / GKE Kubernetes clusters without the control-plane pain. Managed API servers, node pools, autoscaling, add-ons (Ingress, CSI, CNI), GPU pools. Microservices, data/ML platforms, multi-tenant apps.
Serverless Containers Cloud Run / Azure Container Apps Run a container from zero to scale without managing servers. Per-request autoscale to zero, HTTP/async triggers, revisions, simple networking. APIs, event workers, lightweight ETL/ML inference. ⚠️ (short-lived; external storage)
Elastic Container Service ECS / Fargate Orchestrated containers on AWS; Fargate removes servers. Task/Service model, service discovery, IAM, capacity providers; Fargate = serverless execution. Batch jobs, APIs, back-office workers.
Serverless Functions (FaaS)
Service Non-technical description Technical description Typical use case Str Sst Uns
AWS Lambda Run code on events; no servers to manage. Event triggers (API, S3, Kafka), ephemeral runtime, concurrency scaling, extensions. Event processing, light ETL, API backends. ⚠️ (timeout/memory limits)
Azure Functions Event-driven functions on Azure. Bindings (HTTP/Queue/Blob/Cosmos), Durable Functions, consumption/premium plans. Workflows, integrations, reactive tasks. ⚠️
Google Cloud Functions Functions as a service on GCP. Gen 2 on Cloud Run; triggers via Pub/Sub/Storage/HTTP, autoscale. Event glue, small transforms, webhooks. ⚠️
PaaS App Platforms
Service Non-technical description Technical description Typical use case Str Sst Uns
Azure App Service Deploy web apps and APIs without servers. Managed runtime (Windows/Linux), slots, autoscale, VNet integration. Corporate web apps, APIs, portals. ⚠️ (external storage/CDN)
Google App Engine Google’s original PaaS for apps. Standard/Flexible environments, autoscale, built-in logging, services/versions. Multi-service web apps, rapid prototypes. ⚠️
AWS Elastic Beanstalk Upload app → platform handles the rest. Managed provisioning of EC2/ALB/ASG, health checks, rolling updates. 12-factor apps, quick lifts to AWS PaaS. ⚠️
Batch & HPC
Service Non-technical description Technical description Typical use case Str Sst Uns
AWS Batch Queue up container jobs at scale, pay per compute. Managed job queues/compute envs, Spot integration, array jobs, GPU support. Rendering, science/engineering, nightly crunches.
Azure Batch Large-scale scheduled compute on Azure. Pool/Job/Task model, auto-scale pools, low-priority VMs, container support. Simulation, ETL batches, media processing.
Google Cloud Batch Fully managed batch job service. Autoscaled fleets, preemptible VMs, GPU/TPU options, regional queues. Video/ML preprocessing, parameter sweeps.
Big Data & Stream Processing
Service Non-technical description Technical description Typical use case Str Sst Uns
Amazon EMR Managed Spark/Hadoop for big data. EMR on EC2/EKS, autoscaling, Spot, HDFS/S3 storage, many runtimes (Spark/Hive/Presto). ETL at scale, lake processing, ML feature builds.
Google Dataproc Spark/Hadoop on GCP with fast startup. Ephemeral clusters, autoscaling, component gateway; GCS/BQ integrations. Budget-friendly Spark jobs, modernised ETL.
Azure Synapse Spark Apache Spark inside Synapse. Serverless/pooled Spark, notebooks, Delta/Parquet, integrated pipelines. Lakehouse transforms, notebooks, data exploration.
Databricks Multi-cloud Lakehouse compute for data + AI. Managed Spark/Photon, Delta Lake, DLT, MLflow; jobs & clusters. ELT/ML/AI platforms, collaborative notebooks.
Stream Processing Kinesis Data Analytics / Dataflow / Stream Analytics Real-time compute over event streams. Apache Flink (KDA), Apache Beam (Dataflow), SQL windows (ASA); stateful operators. Real-time ETL, anomaly detection, dashboards. ⚠️ (typically references objects)
ML / AI Managed Compute
Service Non-technical description Technical description Typical use case Str Sst Uns
AWS SageMaker Managed ML training and hosting. Studio, training jobs, endpoints, pipelines; GPU/CPU; auto scaling; data connectors. Model training, batch/real-time inference. ✅ (images/audio/text via libs)
Azure Machine Learning Azure’s end-to-end ML platform. Designer/SDK, pipelines, managed endpoints, AutoML, AKS/ACI integration. Enterprise ML ops, governed deployment.
Google Vertex AI Unified ML/AI workbench on GCP. Workbench, AutoML/Custom, pipelines, endpoints, TPU/GPU support. Vision/NLP/tabular ML, scalable inference.

Note: “Support” here means the compute is a good fit to process that data type. Large binaries usually live in object storage (S3/Blob/GCS) and are processed from there.

Main Cloud Storage Options — Quick Reference

Core storage services across clouds. Each row includes a plain-English summary, a technical note, a common use case, and whether it suits Str (Structured), Sst (Semi-structured), and Uns (Unstructured) data.

✅ = strong/native fit ⚠️ = possible/with caveats ❌ = not a fit
Object Storage (Data Lakes)
Service Non-technical description Technical description Typical use case Str Sst Uns
Amazon S3 Durable, low-cost storage for any files. Object store; 11×9 durability; lifecycle to Standard-IA/Glacier; events; S3 Select. Data lakes, backups, analytics staging, media. ✅ (CSV/Parquet files)✅ (JSON/Avro/Parquet)✅ (images/video/PDFs)
Azure Blob Storage incl. ADLS Gen2 Azure’s universal bucket for files. Hot/Cool/Archive tiers; ADLS Gen2 adds hierarchical namespace & ACLs. Lakes on ADLS, archival, ML/analytics landing zones.
Google Cloud Storage (GCS) Google’s object storage for any data. Multi/dual-region; lifecycle; notifications; Autoclass tiering. Data lakes, media libraries, BQ external tables.
Microsoft OneLake Fabric Unified data lake for Fabric workspaces. ADLS Gen2 under the hood; shortcuts; item-level governance. Enterprise lake tightly integrated with Fabric/Power BI.
File Storage (NFS / SMB)
Service Non-technical description Technical description Typical use case Str Sst Uns
Amazon EFS Shared Linux file system. NFSv4.1, elastic scale, multi-AZ, burst/perf modes. Lifted apps needing POSIX; containers; user home dirs. ✅ (files)
Amazon FSx Windows / Lustre / NetApp Managed Windows or high-speed file systems. SMB (Windows), POSIX high-throughput (Lustre), ONTAP features (snap/clone). Windows shares, HPC scratch, enterprise NAS offload.
Azure Files + Azure NetApp Files Managed SMB/NFS shares. AD integration, performance tiers; NetApp = ultra-low latency. Shared drives, VDI profiles, SAP app shares.
Google Filestore Managed NFS for GCP. POSIX-compliant NFS with zonal/regional tiers. GKE shared volumes, content repos, media workflows.
Block Storage (Attach to VMs/Containers)
Service Non-technical description Technical description Typical use case Str Sst Uns
AWS EBS Disks for EC2. SSD/HDD tiers; provisioned IOPS; snapshots; encryption; Multi-Attach (io2). Databases, low-latency apps, boot volumes. ✅ (DB files)
Azure Managed Disks Disks for Azure VMs. Premium/Ultra SSD; ZRS options; snapshots; shared disks. SAP/app servers; high-IOPS databases.
Google Persistent Disk + Local SSD Disks for Compute Engine/GKE. Balanced/SSD/HDD; regional PD; snapshots; Local SSD for ultra-low latency. Relational DBs, stateful services, caches.
Lakehouse Table Formats (on Object Storage)
Service/Format Non-technical description Technical description Typical use case Str Sst Uns
Delta Lake Databricks/Open Tables on your lake with ACID. Transaction log + Parquet; time travel; upserts/merges; schema evolution. Reliable lakehouse ELT; CDC merges; analytics. ✅ (JSON in cols)⚠️ (store files alongside)
Apache Iceberg Open table format for big lakes. Snapshot isolation; hidden partitioning; multi-engine (Spark/Trino/…). Engine-agnostic lakehouse tables & governance. ⚠️
Apache Hudi Incremental tables on a lake. COW/MOR storage; record-level updates; indexes; Spark/Presto/Trino. Near-real-time upserts; incremental pipelines. ⚠️
Archival & Cold Storage
Service Non-technical description Technical description Typical use case Str Sst Uns
Amazon S3 Glacier Instant / Flexible / Deep Very cheap, slower access storage. Archive tiers with minutes–hours retrieval; vaults; lifecycle policies. Long-term retention, compliance copies, DR. ✅ (files)
Azure Archive Storage Cold tier for blobs. Low-cost tier with rehydration; immutable options (legal hold/worm). Backups, legal/regulated archives, rarely accessed data.
Google Coldline / Archive Low-cost long-term storage. GCS classes with retrieval SLA/cost trade-offs; lifecycle rules. Compliance, backups, cold datasets.
Hybrid & Edge Storage (Gateways)
Service Non-technical description Technical description Typical use case Str Sst Uns
AWS Storage Gateway On-prem file/tape/block that writes to S3. File/Volume/Tape gateways; caching; secure sync to S3/Glacier. Hybrid backup, archive to cloud, DR staging.
Azure Data Box / StorSimple Edge appliances for bulk/hybrid storage. Offline/online transfer; tiering to Blob; import/export devices. Datacentre migrations; large dataset seeding.
Google Transfer Appliance / Storage Transfer Move very large datasets into GCP. Rugged appliances; scheduled transfers/sync from S3/HTTP/POSIX. One-off or recurring lake ingestion at scale.

Notes: Object and file stores hold any data as files (great for Sst/Uns). Block storage shines for low-latency databases and app disks. Lakehouse table formats (Delta/Iceberg/Hudi) sit on object storage to bring ACID/table semantics to files.

Databricks Components Overview

🧱 Databricks Components Overview

Component Non-Technical Description 📘 Technical Description ⚙️ Use Case/Scenario 🎯
Workspace Your collaborative project space Hosts notebooks, jobs, repos, and ML experiments Organising analytics or data science projects
Clusters Computing engine (like your personal AI but scalable) Spark-based distributed compute environment (autoscaling or manual) Run jobs, notebooks, ML models
Jobs Automated tasks or scheduled workflows DAG-based execution of notebooks, scripts, or JARs Nightly ETL jobs, ML training pipelines
Notebooks Interactive workspace for code and output Supports Python, SQL, Scala, R, Markdown Exploratory data analysis, prototyping
SQL Editor GUI to query data tables Uses Databricks SQL for BI-friendly interface Business users querying curated tables
Delta Lake Like a spreadsheet that remembers everything ACID-compliant storage layer over Parquet Reliable data lake tables with versioning
Unity Catalog Your data’s filing cabinet and bouncer Central metadata & access control layer with RBAC Secure multi-tenant access across clouds
Lakehouse Platform The Databricks "big idea" — warehouse + lake Combines data lake scalability with DB-like performance Unified platform for batch, stream, ML, and BI
MLflow Your model's history, packaging, and delivery Open-source lifecycle management for ML models Experiment tracking, model registry, deployment
Repos Built-in Git versioning Git-backed source control for notebooks & jobs Code collaboration and CI/CD
Data Explorer Browse your tables like folders Visual UI to inspect catalog, schemas, tables Data discovery and governance check
Dashboards Shareable reports and visuals BI dashboard powered by SQL or notebooks Stakeholder insights and KPIs
Snowflake Components – Cheat Sheet

Snowflake Components – Field Guide

Non-technical and technical descriptions, real-world scenarios, and common gotchas you actually meet on projects.

Category Component Non-technical description Technical description Scenario Common gotchas
Compute Virtual Warehouses On/off “engines” that run your queries and loads. Pick a size, pay while it’s on, pause when idle. MPP compute clusters with independent caches; scale up (bigger) or out (multi-cluster). Auto-suspend/resume; credits billed per-second (min 1 min). Month-end: scale out to clear queues; auto-suspend overnight. Leaving warehouses running; oversizing by habit; long auto-suspend (wasted minutes); each warehouse has its own warm cache.
Pipelines Snowpipe Auto-loads new files as they land. Think “continual drip-feed” rather than big batches. Event/REST-triggered micro-batch COPY from a stage; near-real-time ingestion; charges per file/processing. Landing CSVs from cloud storage every few minutes into a raw table. Millions of tiny files; incorrect file format options; duplicate handling; object store permissions; expecting true streaming latency.
Pipelines Snowpipe Streaminglow-latency Push rows directly into tables without creating files first. Record-based ingest via SDK/connectors; bypasses staged files; designed for sub-minute latency. Clickstream or IoT events needing fast availability for dashboards. Event ordering/duplicates; commit semantics; schema evolution; monitoring cost of continuous ingest.
Change data Streams A change tracker: “what changed since last time?” CDC over base tables/views (insert/update/delete) with consumption offsets; pairs well with Tasks/Dynamic Tables. Incremental processing from raw → curated without re-scanning full tables. Not consuming advances; retention windows; DDL changes breaking downstream expectations.
Orchestration Tasks Built-in scheduler for jobs. Like cron, but in Snowflake. Time- or dependency-triggered DAGs; run SQL/SP on a chosen warehouse or serverless; track history. Nightly dimension refresh after raw ingest completes. Tasks left suspended; wrong warehouse size; timezone surprises; missing privileges.
Transformation Dynamic Tables “Always-fresh” derived tables maintained for you. Declarative objects with a defining query + freshness target; incremental maintenance and dependency tracking. Keep a curated customer table in sync from multiple sources without hand-built DAGs. Assuming they’re free—serverless work still bills; deep chains can hide cost/latency; not a cure-all for messy logic.
Storage Databases / Schemas / Tables / Views / MVs Your folders and sheets: organise data, expose it as tables or views; MVs pre-compute results. Tables (permanent/transient/temp), optional clustering; Views (secure/regular); Materialized Views with refresh costs/limits. Speed up a gnarly join with a targeted MV; secure a view for consumers. Using transient in prod (recovery limits); MVs on non-deterministic queries; over-clustering tiny tables.
Lakehouse External & Iceberg Tables Query data where it lives in your cloud data lake, no copy required. External tables over files in S3/GCS/Azure; Iceberg tables integrate open-table formats/catalogs; partition pruning depends on layout/metadata. Blend Parquet in the lake with internal Snowflake tables for analytics. Stale metadata; path/partition mismatches; small-file performance; storage IAM misconfig.
Ingest I/O Stages (internal/external) Landing zones for files you load from or unload to. Named internal or external stages with credentials/encryption; directory tables for discovery. Partners drop files to an external stage you manage. Credential leakage in URLS; wrong region; case-sensitive paths; forgetting stage privileges.
Ingest I/O File Formats Reusable “how to read this file” settings. Parsing options for CSV/JSON/Avro/Parquet/ORC/XML; referenced by COPY/Snowpipe/External Tables. Standardise CSV quirks (nulls, quotes) across ingest jobs. NULL vs empty strings; date/locale mismatches; hidden BOM/encoding issues.
Governance Masking Policies Hide sensitive values dynamically based on who’s asking. Policy-based column masking evaluated at query time; role/context aware; auditable. Show last-4 of cards to support, full value to finance. Applying to complex types; downstream tools assuming unmasked types; forgetting UNMASK-level access for admins.
Governance Row Access Policies Only the rows you’re allowed to see, nothing more. Predicate functions enforce row-level security on tables/views. Country managers see only their region’s data by role. Over-complex predicates; surprises when combined with filters; diagnosing “missing rows”.
Cost control Resource Monitors Spend tripwires to stop runaway compute. Credit thresholds with actions (notify/suspend) scoped to warehouses. Cap a dev warehouse to prevent accidental 24/7 spend. Only covers warehouses; serverless usage needs separate observation/alerts; period/timezone misunderstandings.
Metadata Information Schema & Account Usage System tables you can query for lineage-ish insight, usage, and health. Database-scoped Information Schema and account-wide views with ingestion latency; ideal for monitoring/reporting. Build a usage dashboard showing query cost by team. Data freshness lag; privilege gaps; mixing object names from different namespaces.
Dev & Apps UDF / UDTF / UDAF Custom functions when SQL alone won’t cut it. Extend with SQL/JS/Java/Python; UDTF returns tables; sandboxed execution. Custom normaliser for messy product codes as a UDF. Performance of row-by-row logic; library limits; cold-start penalties for some runtimes.
Dev & Apps Stored Procedures Procedural scripts with variables and control flow. Run in JS/SQL/Java/Python; can call SQL, manage transactions, and orchestrate tasks. Automate schema rollout and grant routines for new projects. Long-running work timing out; debugging ergonomics; executing on the wrong warehouse.
Dev & Apps Snowpark APIs Write code (DataFrames) that runs close to the data. Python/Scala/Java APIs with pushdown; UDF/UDTF authoring; package management via curated channels. Data-prep pipelines in Python without leaving Snowflake. Accidental client-side collects; serialization limits; dependency/version pinning.
Dev & Apps Snowpark Container Services Run your containers next to your data—ML services, custom apps. Managed container runtime integrated with Snowflake auth/networking; supports services and batch jobs. Serve an in-house ML model via a low-latency API within your Snowflake account. Oversized images; egress/network rules; security approvals; cost of always-on services.
Collaboration Secure Data Sharing Share live data without copying or FTP drama. Provider/Consumer accounts with shared objects; no data duplication; governed access. Give a supplier read-only access to sales without exporting files. Schema changes breaking consumers; accidentally sharing sensitive columns; region/cloud compatibility.
Collaboration Listings & Marketplace An “app store” for data and apps—public or private listings. Package data/apps with terms and versioning; distribute across orgs/regions/clouds. Monetise an industry dataset to partners via private listings. Legal/contracting lag; unclear update cadence; consumer entitlements drifting from expectations.

Tip: pair this with a one-page “operating rules” note—suspend defaults, file-size targets, freshness SLAs, and a short naming convention.

All Dropdowns Closed on Load

Dropdowns (Details) — Always Closed on Page Load

This page ensures <details> elements (dropdowns) are closed whenever the page is opened or refreshed, and also when restored from the back/forward cache.

Example Section A (starts open in markup)
Even though this has open in the HTML, the script will close it on load.
Example Section B
Regular closed state; the script also keeps it closed on initial load.
Example Section C
Another one that starts open in the markup and will be closed by the script.
Top 20 Cloud Databases — Comparison

Top 20 Cloud Databases — Comparison

Purpose, plain-English & technical descriptions, use cases, pros/cons, and a scoring heat‑map (1–5: higher is better).

Scale Latency Cost Predictability Ecosystem Ops Effort Heat cells fill left→right (1=20%, 5=100%).
Database Purpose Non‑Technical Description Technical Description Primary Use Cases Pros Cons Scale Latency Cost Predictability Ecosystem Ops Effort
Amazon DynamoDB Serverless key‑value/document store Massively scalable app data store for simple key lookups and flexible documents. Fully managed NoSQL; consistent hashing partitions; adaptive capacity; streams; global tables. High‑traffic web/mobile backends, IoT, gaming sessions, shopping carts. Serverless scale; global tables; strong SLAs. Query patterns limited; hot partition pitfalls; complex cost tuning. 5 4 3 4 5
Google Cloud Spanner Globally distributed relational DB SQL database that scales across regions while keeping transactions consistent. Distributed SQL with MVCC and TrueTime for external consistency; ANSI SQL; strong schemas. Financial systems, inventory, multi‑region SaaS needing strong consistency. Horizontal scale + SQL + transactions; multi‑region. Higher cost; needs careful schema; tuning differs from single‑node RDBMS. 5 4 3 4 4
Azure Cosmos DB Multi‑model NoSQL (key‑value/doc/graph/column) Low‑latency database with global distribution and flexible data models. APIs for Core (SQL), MongoDB, Cassandra, Gremlin, Table; RU‑based throughput; multi‑region. Global apps, personalization, IoT, event stores. Global distribution; multi‑API; low latency at p99. RU sizing can be tricky; cross‑partition queries can be costly. 5 4 3 4 4
Amazon Aurora (MySQL/Postgres) High‑performance relational (managed) MySQL/Postgres‑compatible engine with better performance and failover. Decoupled compute/storage; 6‑way replication; read replicas; serverless v2 options. OLTP apps, SaaS backends, migrations from on‑prem RDBMS. Drop‑in compatibility; strong HA; autoscaling options. Region‑bound scaling; heavy writes may need sharding. 4 4 4 5 4
Amazon Redshift Cloud data warehouse Columnar SQL warehouse for analytics at scale. MPP, columnar storage, RA3 managed storage, Spectrum external tables; materialized views. Enterprise BI, ELT analytics, semi‑structured via SUPER. Mature ecosystem; performance features; concurrency scaling. Cluster sizing decisions; Spectrum governance; workload mgmt. 4 3 4 5 4
Google BigQuery Serverless data warehouse Analytics engine where you just run SQL and pay per query—no clusters to manage. Dremel‑based columnar engine; separation of storage/compute; BI Engine caches. Ad‑hoc analytics, ELT at scale, ML‑in‑warehouse, log analytics. Near‑zero ops; great price/perf for bursty workloads. Cost predictability lower with ad‑hoc users; quotas/limits. 5 2 3 5 5
Snowflake Cloud data platform/warehouse Elastic SQL warehouse with easy scaling and cross‑cloud support. Decoupled compute/storage; virtual warehouses; time travel; data sharing; native apps. Enterprise analytics, data sharing, multi‑tenant analytics products. Strong ecosystem; easy scaling; governance features. Credit creep if unmanaged; proprietary features increase lock‑in. 5 3 3 5 5
Databricks SQL (Delta Lakehouse) Lakehouse SQL engine SQL over Delta Lake on object storage—warehouse performance with lake flexibility. Photon engine; Delta tables with ACID; Unity Catalog; serverless SQL warehouse. BI on data lake, ELT at scale, medallion architectures. Open formats; strong with streaming + ML adjacent. Tuning required for small, chatty queries; cost‑by‑concurrency. 5 3 3 5 4
Azure SQL Database Managed relational (SQL Server) SQL Server as a service—familiar T‑SQL with built‑in HA and backups. Single database/elastic pools; Hyperscale; automatic tuning; AAD integration. Line‑of‑business apps, SaaS multi‑tenant, reporting stores. Rich SQL features; easy Azure integration; predictable tiers. Vertical scaling limits vs distributed SQL; DTU confusion for newcomers. 4 4 4 5 5
Google AlloyDB for PostgreSQL High‑performance Postgres Postgres‑compatible with faster analytics and OLTP, fully managed. Disaggregated storage; columnar engine for analytics; automatic failover. OLTP plus HTAP‑ish patterns, modern app backends. Great Postgres perf; minimal ops; analytics boosts. GCP‑only; migration from other engines needed. 4 4 4 4 5
MongoDB Atlas Managed document database Flexible JSON document store with global clusters and rich developer tooling. Replica sets, sharding, multi‑cloud; Atlas Search; triggers; Realm/App Services. Content, catalogs, user profiles, event data. Developer‑friendly; flexible schema; strong tools. Cross‑document transactions limited; joins not native; costs scale with usage. 4 4 3 5 5
DataStax Astra DB (Cassandra) Managed wide‑column (Cassandra) Cassandra as a service for write‑heavy, always‑on workloads. Masterless ring; tunable consistency; Stargate APIs (CQL/REST/GraphQL). IoT telemetry, messaging, time‑series with high ingest. Linearly scalable writes; global availability; APIs. Query flexibility limited; model by access pattern. 5 4 3 4 5
Google Cloud Bigtable Managed wide‑column (HBase‑like) Single‑digit ms key‑value at petabyte scale—great for time‑series and personalization. Sparse, distributed row store; SSD/HDD nodes; GC policies. Time‑series, ad tech, personalization features, IoT. Huge scale; predictable low latency if modeled right. Single index (row key) mindset; no joins/aggregations. 5 4 3 4 4
Redis Enterprise Cloud In‑memory data store Super‑fast cache/DB for microsecond reads/writes, with JSON and search options. Redis with clustering, persistence, modules (JSON, Search, Bloom, TimeSeries). Caching, session stores, leaderboards, real‑time features. Ultra‑low latency; versatile modules; enterprise HA. Memory cost; persistence/consistency trade‑offs. 4 5 3 5 5
Elastic Cloud (Elasticsearch) Search & analytics engine Free‑text search and log analytics with dashboards (Kibana). Inverted indexes, distributed shards/replicas; aggregations; ILM; vector search. Search‑heavy apps, observability, security analytics. Great search features; broad ecosystem; Kibana visualisation. Query costs for wide scans; ops complexity for hot‑warm tiers. 4 3 3 5 4
InfluxDB Cloud Time‑series database Purpose‑built for metrics and events over time with a fluent query language. TSM/TSI storage, downsampling/retention; Flux/SQL interfaces; serverless options. IoT metrics, monitoring, SRE/DevOps telemetry. Time‑series ergonomics; tasks for downsampling. Flux is niche (SQL emerging); cardinality pitfalls. 4 4 3 4 5
Timescale Cloud (TimescaleDB) Time‑series on PostgreSQL Postgres with time‑series extensions—nice when you want SQL + time‑series. Hypertables, compression, continuous aggregates; full SQL & Postgres ecosystem. Industrial IoT, finance ticks, app metrics needing joins/SQL. Full SQL power; easy analytics joins; good compression. Not for ultra‑high ingest vs Bigtable/Cassandra tiers. 4 4 4 5 5
CockroachDB (Managed/Dedicated) Distributed SQL (Postgres wire) Resilient SQL that scales horizontally with strong consistency. Raft consensus, range‑based data distribution; Postgres wire compatibility. Resilient SaaS backends, geo‑partitioned data, OLTP scale‑out. Survivable regions; SQL + transactions; online scale. Hot ranges if keys skew; some Postgres features differ. 5 4 4 4 4
PlanetScale Serverless MySQL (Vitess) MySQL that can shard/scale underneath without changing app code. Vitess control plane; branching; online schema changes; connection pooling. Prod MySQL for SaaS; branching for safe changes; scale‑out reads. Zero‑downtime schema changes; developer workflow wins. MySQL‑only; some features limited by Vitess layer. 4 4 4 4 5
Neo4j AuraDB Managed graph database Stores data as nodes and relationships—great for connected queries. Property graph model; Cypher query language; native graph engines. Fraud rings, recommendations, knowledge graphs, network analysis. Expressive relationship queries; fast traversals. Not ideal for big aggregations; different modeling mindset. 3 3 4 4 5
Amazon RDS (PostgreSQL/MySQL) Managed relational (classic) Familiar relational databases without the patching and backups. Managed instances, Multi‑AZ, read replicas, storage autoscaling. Traditional apps, quick lifts from on‑prem, dependable OLTP. Mature, predictable; wide community knowledge. Instance‑bound scaling; manual sharding for big growth. 4 4 4 5 4
Azure Database for PostgreSQL Managed Postgres (Flexible Server) PostgreSQL in Azure with managed HA and scaling. Flexible Server, autoscale, zone‑redundant HA, pg_extensions support. Modern app backends needing Postgres features and Azure integration. Strong AAD/Key Vault integration; familiar tooling. Vertical scaling limits vs distributed SQL; regional. 4 4 4 4 5
Google Cloud SQL Managed MySQL/Postgres/SQL Server Managed relational instances with backups and replicas handled for you. HA configurations, read replicas, private services connect. Standard OLTP apps on GCP, quick migrations. Straightforward; integrates with GCP IAM & VPC. Instance‑bound; not for huge scale‑out by itself. 3 4 4 4 5
Azure Synapse Dedicated SQL Pool MPP warehouse (Azure) Azure’s classic MPP warehouse for large-scale BI with predictable capacity. Distributed compute, PolyBase, materialized views; workload isolation. Enterprise BI, predictable SLAs, integrated Azure stack. Predictable capacity; strong Azure integration. Cluster ops vs serverless engines; less elastic than lakehouse. 4 3 4 4 4
Amazon OpenSearch Service Managed search/observability Search and log analytics compatible with Elasticsearch APIs. Shard/replica management, UltraWarm/Cold storage, OpenSearch Dashboards. Search features, observability stacks on AWS. Good AWS integration; familiar APIs. Ops tuning for tiers/ILM; query costs for wide scans. 4 3 3 4 4
Azure Cosmos DB for MongoDB Mongo API on Cosmos Mongo‑compatible API on Cosmos for global distribution with low latency. RU/s throughput model; multi‑master write regions; Mongo wire protocol compatibility. Global user data, catalogs, content stores needing Mongo semantics. Global replication; automatic indexing; serverless option. RU planning; cross‑partition costs; feature parity varies by version. 5 4 3 4 4
Azure Database for MySQL Managed MySQL (Flexible Server) MySQL managed in Azure with HA and scaling, minus the babysitting. Flexible Server, zone‑redundant HA, Param tuning, Azure Monitor. Web apps, CMS, e‑commerce stacks on Azure. Familiar engine; Azure security integrations. Vertical scaling limits; regional. 3 4 4 4 5
Firestore (Google Cloud) Serverless document DB Serverless JSON store with offline sync for mobile/web apps. Hierarchical documents/collections, real‑time listeners, strong security rules. Mobile/web backends, real‑time presence/chat, small‑team apps. Near‑zero ops; great SDKs; realtime updates. Query constraints; cost spikes with chatty patterns. 4 4 3 4 5
SingleStoreDB Cloud Distributed SQL + vectors Fast SQL store for mixed OLTP/OLAP with vector search features. Shared‑nothing distributed engine, columnstore + rowstore, pipelines. Real‑time analytics, operational reporting, AI features with vectors. Strong mixed‑workload performance; HTAP‑style design. Vendor‑specific features; sizing still matters. 4 4 3 4 4
TiDB Cloud Distributed MySQL‑compatible MySQL‑compatible database that scales horizontally with strong consistency. TiKV (key‑value store) + TiDB SQL layer; Raft consensus; HTAP with TiFlash. Scale‑out OLTP with MySQL compatibility; HTAP patterns. Horizontal scale + SQL; HTAP via TiFlash. Operational tuning for balance; ecosystem smaller than MySQL/Aurora. 5 4 3 3 4

Scoring (1–5, higher is better): Scale = max horizontal/elastic capacity; Latency = low‑latency suitability; Cost Predictability = ease of forecasting monthly costs; Ecosystem = connectors & tooling; Ops Effort = how easy it is to run.

Quick picks by use case:
  • Global OLTP with strong consistency: Cloud Spanner, CockroachDB
  • Write‑heavy time‑series/telemetry: DataStax Astra (Cassandra), Bigtable, InfluxDB
  • Elastic analytics (serverless): BigQuery, Snowflake, Databricks SQL
  • Relational with low ops in Azure: Azure SQL DB, AlloyDB(Postgres on GCP analogue), Azure Postgres
  • Document‑first apps: MongoDB Atlas, Cosmos DB
  • Search/Observability: Elastic Cloud, OpenSearch
  • In‑memory latency: Redis Enterprise Cloud
  • MySQL at scale without drama: PlanetScale, TiDB Cloud