Snowflake Components – Field Guide
Non-technical and technical descriptions, concrete use cases, and the common gotchas you’ll actually meet on projects.
Category | Component | Non-technical description | Technical description | Use cases | Common gotchas |
---|---|---|---|---|---|
Compute | Virtual Warehouses | On/off “engines” that run queries/loads. Choose size, pay while it’s on, pause when idle. | Isolated MPP compute clusters with independent caches; scale up/out; per-second billing (min 60s); auto-suspend/resume. |
• Burst month-end reporting • Parallel ELT loads • Separate dev/test/prod performance isolation |
Forgetting to suspend; habitual oversizing; long auto-suspend; assuming caches are shared across warehouses. |
Pipelines | Snowpipe | Auto-loads new files as they land—continuous trickle instead of big batches. | Event/REST-triggered micro-batch COPY from stages; near-real-time ingestion; cost per processed file/row. |
• Vendor S3/GCS drops • Log/CSV ingestion every few minutes • Lightweight CDC exports |
Millions of tiny files; wrong file format options; duplicate handling; expecting sub-second latency. |
Pipelines | Snowpipe Streaming low-latency | Push rows straight into tables—no files in between. | Record-oriented ingest via SDK/connectors; sub-minute availability; bypasses staged files. |
• Clickstream/IoT telemetry • Real-time fraud features • Rapid dashboard updates |
Event ordering/dupes; commit semantics; schema evolution; continuous ingest cost monitoring. |
Change data | Streams | A change-tracker that answers “what changed since last run?” | CDC over base tables/views (I/U/D) with consumption offsets; pairs with Tasks/Dynamic Tables for incremental processing. |
• Incremental raw→silver transforms • Rebuild downstream deltas • Audit and replay |
Not consuming advances the offset; retention windows; DDL changes surprising downstream jobs. |
Orchestration | Tasks | Built-in scheduler for SQL/procedural jobs. Like cron, inside Snowflake. | Time/dependency-triggered DAGs; run SQL/SP on a warehouse or serverless; track history and failures. |
• Nightly dimensions & facts • Backfill chains • Orchestrate quality checks |
Tasks left suspended; wrong warehouse sizing; timezone mismatches; missing privileges. |
Transformation | Dynamic Tables | “Always-fresh” derived tables that Snowflake maintains for you. | Declarative objects with a defining query + freshness target; incremental dependency management; serverless maintenance. |
• Keep curated marts in sync • Simplify DAGs for ELT • Auto-maintain SCD logic |
Not free—serverless work bills; deep chains hide latency/cost; not a silver bullet for messy logic. |
Storage | DBs/Schemas/Tables/Views/MVs | Your folders and sheets: organise, expose, and accelerate data (MVs pre-compute). | Permanent/transient/temp tables; clustering keys; secure/regular views; MVs with refresh costs/limits. |
• Secure data access via views • Accelerate heavy joins with MVs • Separate zones (raw/silver/gold) |
Transient in prod limits recovery; MVs on non-deterministic queries; over-clustering tiny tables. |
Lakehouse | External & Iceberg Tables | Query lake data where it lives—no copy required. | External tables over object storage; Iceberg tables with open-table format/catalog; partition pruning depends on layout/metadata. |
• Blend Parquet with internal tables • Open-format interoperability • Low-cost archival analytics |
Stale metadata; partition/path mismatches; small-file performance; IAM misconfig. |
Ingest I/O | Stages (internal/external) | Landing zones to load from or unload to. | Named stages with creds/encryption; directory tables for discovery; supports scoped credentials. |
• Partner drop-zones • Bulk unload for sharing • Pre-processing areas |
Leaking creds in URLs; wrong region; case-sensitive paths; missing stage privileges. |
Ingest I/O | File Formats | Reusable “how to read this file” recipes. | Parsing options for CSV/JSON/Avro/Parquet/ORC/XML; referenced by COPY/Snowpipe/External Tables. |
• Standardise CSV quirks • JSON semi-structured loads • Parquet high-throughput ingest |
NULL vs empty strings; date/locale issues; BOM/encoding surprises. |
Governance | Masking Policies | Hide sensitive values dynamically based on who’s asking. | Column-level policies evaluated at query time; role/context aware; auditable via usage views. |
• PII protection in shared views • Dev/test obfuscation • Role-based reveals (e.g., last-4) |
Complex types; BI tools assuming unmasked types; forgetting UNMASK for admins. |
Governance | Row Access Policies | Limit rows by role or attributes—“see only your slice”. | Predicate functions enforce RLS on tables/views; evaluated per-query. |
• Region/BUs scoping • Customer tenancy isolation • Least-privilege analytics |
Over-complex predicates; filter interactions; “missing rows” confusion. |
Cost control | Resource Monitors | Spend tripwires to catch runaway compute. | Credit thresholds with actions (notify/suspend) scoped to warehouses or accounts. |
• Cap dev/playground spend • Alert finance on spikes • Guardrails for PoCs |
Warehouse-only by default; separate tracking for serverless; period/timezone assumptions. |
Metadata | Information Schema & Account Usage | System tables for usage, health, and lineage-ish visibility. | DB-scoped info schema + account-wide views (ingestion latency); ideal for monitoring & finops. |
• Cost & performance dashboards • Object inventory & drift checks • Simple lineage discovery |
Freshness lag; privilege gaps; mixing names across databases/schemas. |
Dev & Apps | UDF / UDTF / UDAF | Custom functions when plain SQL won’t do. | Extend with SQL/JS/Java/Python; UDTF returns tables; sandboxed execution with pushdown limits. |
• Code normalisers/cleansers • Tokenisation/regex helpers • Sessionisation as UDTF |
Row-by-row overhead; library limits; cold-starts (some runtimes). |
Dev & Apps | Stored Procedures | Procedural scripts with variables and control flow. | JS/SQL/Java/Python procs; manage transactions; call SQL; orchestrate tasks and admin routines. |
• Schema/grants automation • Migration utilities • Complex ELT orchestration |
Long runs timing out; tricky debugging; running on the wrong warehouse. |
Dev & Apps | Snowpark APIs | DataFrame code that runs close to the data. | Python/Scala/Java APIs with pushdown; UDF/UDTF authoring; packaged deps via curated channels. |
• Feature engineering • In-platform data prep • ML scoring pipelines |
Accidental client-side collects; serialization limits; dependency pinning/version drift. |
Dev & Apps | Snowpark Container Services | Run your containers next to your data—apps and services. | Managed container runtime with Snowflake auth/networking; supports services and batch jobs. |
• Model serving APIs • Custom connectors • Batch image/text processing |
Oversized images; egress/network hurdles; security approvals; cost of always-on services. |
Collaboration | Secure Data Sharing | Share live data without copies or FTP drama. | Provider/Consumer model with shared objects; governed access; no data duplication. |
• Supplier/customer analytics portals • Cross-subsidiary sharing • Inter-cloud collaboration |
Schema changes breaking consumers; accidentally sharing sensitive columns; region/cloud compatibility. |
Collaboration | Listings & Marketplace | An “app store” for data/apps—public or private. | Package datasets/apps with terms/versioning; distribute across orgs/regions/clouds. |
• Publish industry datasets • Subscribe to external signals • Monetise proprietary data |
Legal/contracting lag; unclear update cadence; consumer entitlements drifting from expectations. |
Tip: pair this with a one-page “operating rules” note—suspend defaults, file-size targets, freshness SLAs, and a short naming convention.