Snowflake Data Pipeline

Project Information

Category: Data Engineering & Business Intelligence
Platform: Snowflake, Power BI
Project Type: Academic / Professional Portfolio
Scope: End-to-end data pipeline: Kaggle → Snowflake → Power BI

End-to-End Snowflake Pipeline

Overview:

This project implements a complete data pipeline ingesting public datasets from Kaggle, storing them in Snowflake raw tables, transforming into a Star Schema via Snowflake views and Power Query, and generating dashboards in Power BI for analytical insights.

Objectives:

Design a scalable ETL pipeline with Python and Snowflake
Create a dimensional Star Schema for reporting
Build interactive dashboards in Power BI
Ensure data accuracy and consistency through validation checks
Provide analytics-ready datasets for business decisions

Architecture:

Extraction: Kaggle datasets downloaded via Python scripts
Loading: Raw tables in Snowflake (ANALYTICS_REPO schema)
Transformation: Snowflake views create dimensions and fact tables (Star Schema)
Visualization: Power BI dashboards consuming the Star Schema

Fact & Dimension Tables:

Fact_Sales
Dim_Date
Dim_Customer
Dim_Product
Dim_AgeGroup

Technology Stack:

Kaggle, Python, Snowflake, Power Query, Power BI
Pandas, json, kagglehub
Data validation and version control (Git/GitHub)

Key Learnings:

End-to-end data engineering workflow
Dimensional modelling for BI
Data validation and accuracy checks
Building professional dashboards for decision-making

Comparison of Transformation Approaches:

Approach	Advantages	Disadvantages
Snowflake Views	- No storage duplication, lightweight for static datasets - Easy to maintain and adjust logic without reloading data - Ideal for datasets without incremental updates	- Slightly slower for very large queries - Not suitable for frequently updated transactional data - Limited optimization for complex joins or aggregations
Snowflake Tables	- Persistent storage, optimized for large-scale reporting - Supports incremental loading and indexes - Best for frequently updated or transactional datasets	- Requires storage space and management - Updates require ETL jobs or scripts - Less flexible for rapid schema changes
Power Query (Desktop / ETL)	- Very user-friendly, visual interface - Quick prototyping and column type adjustments - Good for small/medium static datasets	- Limited scalability for large datasets - Local processing may be slower - Transformations tied to Excel/Power BI environment

Note: In this project, a Snowflake view was chosen instead of a table because the dataset is static and does not require incremental updates. This allows for a lighter, easier-to-maintain solution without unnecessary storage duplication, while still enabling Star Schema transformation for Power BI dashboards.

Full source code available on https://github.com/fmulato/SNOWFLAKE_PROJECT