Databricks Components Overview

🧱 Databricks Components Overview

Component Non-Technical Description πŸ“˜ Technical Description βš™οΈ Use Case/Scenario 🎯
Workspace Your collaborative project space Hosts notebooks, jobs, repos, and ML experiments Organising analytics or data science projects
Clusters Computing engine (like your personal AI but scalable) Spark-based distributed compute environment (autoscaling or manual) Run jobs, notebooks, ML models
Jobs Automated tasks or scheduled workflows DAG-based execution of notebooks, scripts, or JARs Nightly ETL jobs, ML training pipelines
Notebooks Interactive workspace for code and output Supports Python, SQL, Scala, R, Markdown Exploratory data analysis, prototyping
SQL Editor GUI to query data tables Uses Databricks SQL for BI-friendly interface Business users querying curated tables
Delta Lake Like a spreadsheet that remembers everything ACID-compliant storage layer over Parquet Reliable data lake tables with versioning
Unity Catalog Your data’s filing cabinet and bouncer Central metadata & access control layer with RBAC Secure multi-tenant access across clouds
Lakehouse Platform The Databricks "big idea" β€” warehouse + lake Combines data lake scalability with DB-like performance Unified platform for batch, stream, ML, and BI
MLflow Your model's history, packaging, and delivery Open-source lifecycle management for ML models Experiment tracking, model registry, deployment
Repos Built-in Git versioning Git-backed source control for notebooks & jobs Code collaboration and CI/CD
Data Explorer Browse your tables like folders Visual UI to inspect catalog, schemas, tables Data discovery and governance check
Dashboards Shareable reports and visuals BI dashboard powered by SQL or notebooks Stakeholder insights and KPIs