Skip to content

End-to-end sales data warehouse built with Databricks Delta Live Tables. Features automated ETL, change data capture, and medallion architecture. Transforms raw multi-region sales data into analytics-ready dimensional models.

Notifications You must be signed in to change notification settings

meetzaveri29/databricks-declarative-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ—οΈ Sales Data Declarative Pipeline - Databricks Delta Live Tables

Declarative ETL pipeline built with Databricks DLT following Medallion Architecture

Data Flow Diagram

🎯 What This Does

Transforms raw sales data from multiple regions into a production-ready data warehouse with:

  • Automated data quality checks
  • Change data capture (SCD Type 1 & 2)
  • Incremental processing
  • Historical tracking of dimension changes

πŸ›οΈ Architecture

Raw Data β†’ Bronze (Ingest) β†’ Silver (Clean) β†’ Gold (Analytics)
  • Bronze: Raw data ingestion with basic validation
  • Silver: Cleaned, enriched data with upserts
  • Gold: Dimensional model ready for BI tools

πŸ“‚ Project Structure

source_code/
β”œβ”€β”€ bronze/     # Raw data ingestion
β”œβ”€β”€ silver/     # Data cleaning & enrichment  
└── gold/       # Dimensional model & business views

πŸš€ Quick Start

  1. Setup Databricks workspace (Free tier works)
  2. Enable Lakeflow pipeline editor in Settings > Developer
  3. Create catalog: dlt with schema source
  4. Load source data (sales_east, sales_west, products, customers)
  5. Create DLT pipeline and upload code files
  6. Run pipeline - that's it!

πŸ’‘ Key Features

  • Zero-config CDC: Automatic handling of inserts, updates, deletes
  • Data quality gates: Bad data stopped at source
  • Smart incremental loads: Only processes what changed
  • SCD Type 2: Full history preserved automatically
  • One-click deployment: Declarative approach = less code

πŸ“Š Sample Output

After running, you'll have:

  • dim_products & dim_customers (with full history)
  • fact_sales (optimized for queries)
  • business_view_sales (ready for dashboards)

πŸ”§ Technologies

  • Databricks Delta Live Tables
  • PySpark
  • Delta Lake
  • Medallion Architecture

Built following YouTube tutorial on modern data engineering with Databricks

About

End-to-end sales data warehouse built with Databricks Delta Live Tables. Features automated ETL, change data capture, and medallion architecture. Transforms raw multi-region sales data into analytics-ready dimensional models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages