Snowflake | IainToolin

Snowflake Components – Field Guide

Non-technical and technical descriptions, concrete use cases, and the common gotchas you’ll actually meet on projects.

Category	Component	Non-technical description	Technical description	Use cases	Common gotchas
Compute	Virtual Warehouses	On/off “engines” that run queries/loads. Choose size, pay while it’s on, pause when idle.	Isolated MPP compute clusters with independent caches; scale up/out; per-second billing (min 60s); auto-suspend/resume.	• Burst month-end reporting • Parallel ELT loads • Separate dev/test/prod performance isolation	Forgetting to suspend; habitual oversizing; long auto-suspend; assuming caches are shared across warehouses.
Pipelines	Snowpipe	Auto-loads new files as they land—continuous trickle instead of big batches.	Event/REST-triggered micro-batch COPY from stages; near-real-time ingestion; cost per processed file/row.	• Vendor S3/GCS drops • Log/CSV ingestion every few minutes • Lightweight CDC exports	Millions of tiny files; wrong file format options; duplicate handling; expecting sub-second latency.
Pipelines	Snowpipe Streaming low-latency	Push rows straight into tables—no files in between.	Record-oriented ingest via SDK/connectors; sub-minute availability; bypasses staged files.	• Clickstream/IoT telemetry • Real-time fraud features • Rapid dashboard updates	Event ordering/dupes; commit semantics; schema evolution; continuous ingest cost monitoring.
Change data	Streams	A change-tracker that answers “what changed since last run?”	CDC over base tables/views (I/U/D) with consumption offsets; pairs with Tasks/Dynamic Tables for incremental processing.	• Incremental raw→silver transforms • Rebuild downstream deltas • Audit and replay	Not consuming advances the offset; retention windows; DDL changes surprising downstream jobs.
Orchestration	Tasks	Built-in scheduler for SQL/procedural jobs. Like cron, inside Snowflake.	Time/dependency-triggered DAGs; run SQL/SP on a warehouse or serverless; track history and failures.	• Nightly dimensions & facts • Backfill chains • Orchestrate quality checks	Tasks left suspended; wrong warehouse sizing; timezone mismatches; missing privileges.
Transformation	Dynamic Tables	“Always-fresh” derived tables that Snowflake maintains for you.	Declarative objects with a defining query + freshness target; incremental dependency management; serverless maintenance.	• Keep curated marts in sync • Simplify DAGs for ELT • Auto-maintain SCD logic	Not free—serverless work bills; deep chains hide latency/cost; not a silver bullet for messy logic.
Storage	DBs/Schemas/Tables/Views/MVs	Your folders and sheets: organise, expose, and accelerate data (MVs pre-compute).	Permanent/transient/temp tables; clustering keys; secure/regular views; MVs with refresh costs/limits.	• Secure data access via views • Accelerate heavy joins with MVs • Separate zones (raw/silver/gold)	Transient in prod limits recovery; MVs on non-deterministic queries; over-clustering tiny tables.
Lakehouse	External & Iceberg Tables	Query lake data where it lives—no copy required.	External tables over object storage; Iceberg tables with open-table format/catalog; partition pruning depends on layout/metadata.	• Blend Parquet with internal tables • Open-format interoperability • Low-cost archival analytics	Stale metadata; partition/path mismatches; small-file performance; IAM misconfig.
Ingest I/O	Stages (internal/external)	Landing zones to load from or unload to.	Named stages with creds/encryption; directory tables for discovery; supports scoped credentials.	• Partner drop-zones • Bulk unload for sharing • Pre-processing areas	Leaking creds in URLs; wrong region; case-sensitive paths; missing stage privileges.
Ingest I/O	File Formats	Reusable “how to read this file” recipes.	Parsing options for CSV/JSON/Avro/Parquet/ORC/XML; referenced by COPY/Snowpipe/External Tables.	• Standardise CSV quirks • JSON semi-structured loads • Parquet high-throughput ingest	NULL vs empty strings; date/locale issues; BOM/encoding surprises.
Governance	Masking Policies	Hide sensitive values dynamically based on who’s asking.	Column-level policies evaluated at query time; role/context aware; auditable via usage views.	• PII protection in shared views • Dev/test obfuscation • Role-based reveals (e.g., last-4)	Complex types; BI tools assuming unmasked types; forgetting UNMASK for admins.
Governance	Row Access Policies	Limit rows by role or attributes—“see only your slice”.	Predicate functions enforce RLS on tables/views; evaluated per-query.	• Region/BUs scoping • Customer tenancy isolation • Least-privilege analytics	Over-complex predicates; filter interactions; “missing rows” confusion.
Cost control	Resource Monitors	Spend tripwires to catch runaway compute.	Credit thresholds with actions (notify/suspend) scoped to warehouses or accounts.	• Cap dev/playground spend • Alert finance on spikes • Guardrails for PoCs	Warehouse-only by default; separate tracking for serverless; period/timezone assumptions.
Metadata	Information Schema & Account Usage	System tables for usage, health, and lineage-ish visibility.	DB-scoped info schema + account-wide views (ingestion latency); ideal for monitoring & finops.	• Cost & performance dashboards • Object inventory & drift checks • Simple lineage discovery	Freshness lag; privilege gaps; mixing names across databases/schemas.
Dev & Apps	UDF / UDTF / UDAF	Custom functions when plain SQL won’t do.	Extend with SQL/JS/Java/Python; UDTF returns tables; sandboxed execution with pushdown limits.	• Code normalisers/cleansers • Tokenisation/regex helpers • Sessionisation as UDTF	Row-by-row overhead; library limits; cold-starts (some runtimes).
Dev & Apps	Stored Procedures	Procedural scripts with variables and control flow.	JS/SQL/Java/Python procs; manage transactions; call SQL; orchestrate tasks and admin routines.	• Schema/grants automation • Migration utilities • Complex ELT orchestration	Long runs timing out; tricky debugging; running on the wrong warehouse.
Dev & Apps	Snowpark APIs	DataFrame code that runs close to the data.	Python/Scala/Java APIs with pushdown; UDF/UDTF authoring; packaged deps via curated channels.	• Feature engineering • In-platform data prep • ML scoring pipelines	Accidental client-side collects; serialization limits; dependency pinning/version drift.
Dev & Apps	Snowpark Container Services	Run your containers next to your data—apps and services.	Managed container runtime with Snowflake auth/networking; supports services and batch jobs.	• Model serving APIs • Custom connectors • Batch image/text processing	Oversized images; egress/network hurdles; security approvals; cost of always-on services.
Collaboration	Secure Data Sharing	Share live data without copies or FTP drama.	Provider/Consumer model with shared objects; governed access; no data duplication.	• Supplier/customer analytics portals • Cross-subsidiary sharing • Inter-cloud collaboration	Schema changes breaking consumers; accidentally sharing sensitive columns; region/cloud compatibility.
Collaboration	Listings & Marketplace	An “app store” for data/apps—public or private.	Package datasets/apps with terms/versioning; distribute across orgs/regions/clouds.	• Publish industry datasets • Subscribe to external signals • Monetise proprietary data	Legal/contracting lag; unclear update cadence; consumer entitlements drifting from expectations.

Tip: pair this with a one-page “operating rules” note—suspend defaults, file-size targets, freshness SLAs, and a short naming convention.