Snowflake Components – Field Guide

Snowflake Components – Field Guide

Non-technical and technical descriptions, concrete use cases, and the common gotchas you’ll actually meet on projects.

Category Component Non-technical description Technical description Use cases Common gotchas
Compute Virtual Warehouses On/off “engines” that run queries/loads. Choose size, pay while it’s on, pause when idle. Isolated MPP compute clusters with independent caches; scale up/out; per-second billing (min 60s); auto-suspend/resume. • Burst month-end reporting
• Parallel ELT loads
• Separate dev/test/prod performance isolation
Forgetting to suspend; habitual oversizing; long auto-suspend; assuming caches are shared across warehouses.
Pipelines Snowpipe Auto-loads new files as they land—continuous trickle instead of big batches. Event/REST-triggered micro-batch COPY from stages; near-real-time ingestion; cost per processed file/row. • Vendor S3/GCS drops
• Log/CSV ingestion every few minutes
• Lightweight CDC exports
Millions of tiny files; wrong file format options; duplicate handling; expecting sub-second latency.
Pipelines Snowpipe Streaming low-latency Push rows straight into tables—no files in between. Record-oriented ingest via SDK/connectors; sub-minute availability; bypasses staged files. • Clickstream/IoT telemetry
• Real-time fraud features
• Rapid dashboard updates
Event ordering/dupes; commit semantics; schema evolution; continuous ingest cost monitoring.
Change data Streams A change-tracker that answers “what changed since last run?” CDC over base tables/views (I/U/D) with consumption offsets; pairs with Tasks/Dynamic Tables for incremental processing. • Incremental raw→silver transforms
• Rebuild downstream deltas
• Audit and replay
Not consuming advances the offset; retention windows; DDL changes surprising downstream jobs.
Orchestration Tasks Built-in scheduler for SQL/procedural jobs. Like cron, inside Snowflake. Time/dependency-triggered DAGs; run SQL/SP on a warehouse or serverless; track history and failures. • Nightly dimensions & facts
• Backfill chains
• Orchestrate quality checks
Tasks left suspended; wrong warehouse sizing; timezone mismatches; missing privileges.
Transformation Dynamic Tables “Always-fresh” derived tables that Snowflake maintains for you. Declarative objects with a defining query + freshness target; incremental dependency management; serverless maintenance. • Keep curated marts in sync
• Simplify DAGs for ELT
• Auto-maintain SCD logic
Not free—serverless work bills; deep chains hide latency/cost; not a silver bullet for messy logic.
Storage DBs/Schemas/Tables/Views/MVs Your folders and sheets: organise, expose, and accelerate data (MVs pre-compute). Permanent/transient/temp tables; clustering keys; secure/regular views; MVs with refresh costs/limits. • Secure data access via views
• Accelerate heavy joins with MVs
• Separate zones (raw/silver/gold)
Transient in prod limits recovery; MVs on non-deterministic queries; over-clustering tiny tables.
Lakehouse External & Iceberg Tables Query lake data where it lives—no copy required. External tables over object storage; Iceberg tables with open-table format/catalog; partition pruning depends on layout/metadata. • Blend Parquet with internal tables
• Open-format interoperability
• Low-cost archival analytics
Stale metadata; partition/path mismatches; small-file performance; IAM misconfig.
Ingest I/O Stages (internal/external) Landing zones to load from or unload to. Named stages with creds/encryption; directory tables for discovery; supports scoped credentials. • Partner drop-zones
• Bulk unload for sharing
• Pre-processing areas
Leaking creds in URLs; wrong region; case-sensitive paths; missing stage privileges.
Ingest I/O File Formats Reusable “how to read this file” recipes. Parsing options for CSV/JSON/Avro/Parquet/ORC/XML; referenced by COPY/Snowpipe/External Tables. • Standardise CSV quirks
• JSON semi-structured loads
• Parquet high-throughput ingest
NULL vs empty strings; date/locale issues; BOM/encoding surprises.
Governance Masking Policies Hide sensitive values dynamically based on who’s asking. Column-level policies evaluated at query time; role/context aware; auditable via usage views. • PII protection in shared views
• Dev/test obfuscation
• Role-based reveals (e.g., last-4)
Complex types; BI tools assuming unmasked types; forgetting UNMASK for admins.
Governance Row Access Policies Limit rows by role or attributes—“see only your slice”. Predicate functions enforce RLS on tables/views; evaluated per-query. • Region/BUs scoping
• Customer tenancy isolation
• Least-privilege analytics
Over-complex predicates; filter interactions; “missing rows” confusion.
Cost control Resource Monitors Spend tripwires to catch runaway compute. Credit thresholds with actions (notify/suspend) scoped to warehouses or accounts. • Cap dev/playground spend
• Alert finance on spikes
• Guardrails for PoCs
Warehouse-only by default; separate tracking for serverless; period/timezone assumptions.
Metadata Information Schema & Account Usage System tables for usage, health, and lineage-ish visibility. DB-scoped info schema + account-wide views (ingestion latency); ideal for monitoring & finops. • Cost & performance dashboards
• Object inventory & drift checks
• Simple lineage discovery
Freshness lag; privilege gaps; mixing names across databases/schemas.
Dev & Apps UDF / UDTF / UDAF Custom functions when plain SQL won’t do. Extend with SQL/JS/Java/Python; UDTF returns tables; sandboxed execution with pushdown limits. • Code normalisers/cleansers
• Tokenisation/regex helpers
• Sessionisation as UDTF
Row-by-row overhead; library limits; cold-starts (some runtimes).
Dev & Apps Stored Procedures Procedural scripts with variables and control flow. JS/SQL/Java/Python procs; manage transactions; call SQL; orchestrate tasks and admin routines. • Schema/grants automation
• Migration utilities
• Complex ELT orchestration
Long runs timing out; tricky debugging; running on the wrong warehouse.
Dev & Apps Snowpark APIs DataFrame code that runs close to the data. Python/Scala/Java APIs with pushdown; UDF/UDTF authoring; packaged deps via curated channels. • Feature engineering
• In-platform data prep
• ML scoring pipelines
Accidental client-side collects; serialization limits; dependency pinning/version drift.
Dev & Apps Snowpark Container Services Run your containers next to your data—apps and services. Managed container runtime with Snowflake auth/networking; supports services and batch jobs. • Model serving APIs
• Custom connectors
• Batch image/text processing
Oversized images; egress/network hurdles; security approvals; cost of always-on services.
Collaboration Secure Data Sharing Share live data without copies or FTP drama. Provider/Consumer model with shared objects; governed access; no data duplication. • Supplier/customer analytics portals
• Cross-subsidiary sharing
• Inter-cloud collaboration
Schema changes breaking consumers; accidentally sharing sensitive columns; region/cloud compatibility.
Collaboration Listings & Marketplace An “app store” for data/apps—public or private. Package datasets/apps with terms/versioning; distribute across orgs/regions/clouds. • Publish industry datasets
• Subscribe to external signals
• Monetise proprietary data
Legal/contracting lag; unclear update cadence; consumer entitlements drifting from expectations.

Tip: pair this with a one-page “operating rules” note—suspend defaults, file-size targets, freshness SLAs, and a short naming convention.