Financial Transaction Anomaly Detection Engine

Snowflake MLOps Pipeline with Snowpark & Isolation Forest

Project Overview

This repository implements a production-ready MLOps system within the Snowflake Data Cloud. By utilizing Snowpark Python, the entire ML lifecycle—from feature engineering to real-time inference—is executed natively inside Snowflake, ensuring enterprise-grade security and zero-copy data architecture.

Business Impact

Detection Rate: Increased identification of suspicious activities by 33%.
Precision: Reduced False Positive Rates by 20% via refined threshold tuning ($score < -0.06$).
Efficiency: Automated risk-triage workflows, cutting investigation lead time by 40%.

Technical Architecture

The pipeline follows a serverless architecture, eliminating the latency and security risks associated with moving data to external compute environments.

Ingestion: Automated streaming from AWS S3 via Snowpipe.
Feature Engineering: Scale-out processing of transaction velocity and behavioral aggregates using Snowpark DataFrames.
Modeling: Isolation Forest (Scikit-learn) trained and serialized to a Snowflake Stage.
Deployment: Model operationalized as a permanent Python UDF for real-time, in-warehouse scoring.
Orchestration: Snowflake Tasks trigger incremental hourly scoring and risk tiering (High/Med/Low).
Analytics: Dynamic SQL views powering a Live Tableau Dashboard.

Repository Structure

Folder	Description
`Setup/`	SQL DDL for environment configuration (Stages, Snowpipe, RBAC).
`Modeling/`	Snowpark Python scripts for feature engineering and model training.
`Automation/`	Implementation of the Python UDF and scheduled Snowflake Tasks.
`View/`	SQL logic for the final reporting layer and risk-level tiering.

Tech Stack

Language: Python (Snowpark), SQL
Cloud: Snowflake, AWS (S3)
ML Libraries: Scikit-learn (Isolation Forest), Joblib
Visualization: Tableau

How to Deploy

Execute Setup/ scripts to initialize the Snowflake environment.
Run Modeling/ to train the model and upload the .joblib file to the Snowflake stage.
Deploy the scoring logic via Automation/ to create the Python UDF.
Activate the Snowflake Task to begin hourly automated inference.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Automation		Automation
EDA		EDA
Model		Model
Setup		Setup
View		View
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Financial Transaction Anomaly Detection Engine

Snowflake MLOps Pipeline with Snowpark & Isolation Forest

Project Overview

Business Impact

Technical Architecture

Repository Structure

Tech Stack

How to Deploy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Financial Transaction Anomaly Detection Engine

Snowflake MLOps Pipeline with Snowpark & Isolation Forest

Project Overview

Business Impact

Technical Architecture

Repository Structure

Tech Stack

How to Deploy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages