Online Retail — K-Means Customer Segmentation

Goal: Use unsupervised learning (K-Means) on RFM features to uncover actionable customer segments from the UCI Online Retail dataset. Business Impact: Customer segmentation drives personalized marketing, improves ROI, and enhances customer retention & engagement.

🚀 Executive Summary

This notebook demonstrates a complete end-to-end ML pipeline that:

Cleans & standardizes messy retail transactions
Constructs RFM (Recency, Frequency, Monetary) features
Scales features & applies K-Means clustering
Determines optimal k via Elbow & Silhouette methods
Produces 4 actionable segments + 3 outlier micro-segments
Visualizes clusters for business storytelling

📊 Key Outcomes:

Raw rows: 525,461 → Cleaned: 406,309 (77.32% retained)
Unique customers: 4,285
Clustered customers (after outlier removal): 3,809
Optimal clusters (k): 4
Actionable Segments: Retain, Reward, Re-Engage, Nurture + Outlier cohorts (Pamper, Upsell, Delight)

📂 Dataset

Source: UCI ML Repository — Online Retail II
Period: 2009-12-01 → 2010-12-09
Countries: 40 (UK dominates with 485,852 rows)
Schema (8 cols): Invoice, StockCode, Description, Quantity, InvoiceDate, Price, Customer ID, Country

🧹 Data Cleaning

Steps applied:

Invoice filtering → keep only 6-digit numeric invoices (^\d{6}$).
StockCode filtering → exclude admin/test codes (keep only SKU-like).
Drop rows with missing Customer IDs.
Remove zero-priced lines (28 rows removed).
Final cleaned dataset: 406,309 rows (77.32% of raw).

📌 Distribution Check:

📌 With Outliers Highlighted:

📌 After Major Outlier Seperated:

📊 RFM Feature Engineering

Features per customer:

Recency (R): Days since last purchase
Frequency (F): Number of invoices
Monetary (M): Total spend

Summary (non-outliers, 3,809 customers):

Recency: mean = 97d, median = 58d
Frequency: mean = 2.86, median = 2
Monetary: mean = 885.5, median = 588.1

🤖 Modeling

Algorithm: KMeans(n_clusters=4, random_state=42, max_iter=1000)
Scaling: StandardScaler on [R, F, M]
Model Selection:

📌 Elbow + Silhouette Diagnostic:

Optimal K = 4

🧩 Segmentation Results

🔑 Core Clusters

Cluster 0 — Retain (High value, recent buyers)
- Playbook: VIP care, exclusive perks, proactive engagement.
Cluster 1 — Re-Engage (Low frequency & spend, lapsed)
- Playbook: Win-back offers, personalized reactivation emails.
Cluster 2 — Nurture (Lowest activity, often new)
- Playbook: Onboarding, education, low-friction deals.
Cluster 3 — Reward (Loyal, consistent buyers)
- Playbook: Loyalty rewards, referral incentives, bundles.

📌 Cluster Visualization (Raw):

📌 Cluster Visualization (Scaled):

📌 KMeans Cluster Separation:

📌 Violin Plots (Segment Profiles):

🎯 Outlier Micro-Segments

Cluster −1 — Pamper → High spenders → bespoke offers, concierge.
Cluster −2 — Upsell → Frequent shoppers → subscriptions, bundles.
Cluster −3 — Delight → Elite (high R + F + M) → premium tiers, surprise perks.

💼 Business Applications

Retention & Loyalty → Retain & Reward cohorts
Reactivation Campaigns → Target Re-Engage group
Acquisition Funnel → Nurture new/dormant customers
Premium Strategy → Outlier groups (Pamper, Upsell, Delight)

⚙️ Tech Stack

Python 3.10+, Jupyter Notebook
pandas → data manipulation
scikit-learn → KMeans, scaling, silhouette
matplotlib & seaborn → visualization
openpyxl → Excel I/O

🌟 Highlights

✔️ Built a scalable, reproducible ML pipeline from raw retail logs

✔️ Applied RFM analysis + K-Means for customer segmentation

✔️ Delivered business-ready insights & playbooks tied to ROI levers

✔️ Integrated robust data cleaning, outlier handling, and model diagnostics

✔️ Produced clear visualizations & segment storytelling for stakeholders

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Dataset		Dataset
images		images
Kmeans online retail2.ipynb		Kmeans online retail2.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Online Retail — K-Means Customer Segmentation

🚀 Executive Summary

📂 Dataset

🧹 Data Cleaning

📊 RFM Feature Engineering

🤖 Modeling

🧩 Segmentation Results

🔑 Core Clusters

🎯 Outlier Micro-Segments

💼 Business Applications

⚙️ Tech Stack

🌟 Highlights

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Online Retail — K-Means Customer Segmentation

🚀 Executive Summary

📂 Dataset

🧹 Data Cleaning

📊 RFM Feature Engineering

🤖 Modeling

🧩 Segmentation Results

🔑 Core Clusters

🎯 Outlier Micro-Segments

💼 Business Applications

⚙️ Tech Stack

🌟 Highlights

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages