๐ง Objective
Segment customers of a UK-based online gift retailer using transaction data to uncover behavior-driven groups for targeted marketing.
๐ฆ Dataset
- Transactions from Dec 2010 to Dec 2011
- 541,909 records, 8 columns
- Key columns: InvoiceDate, Quantity, UnitPrice, CustomerID, Country
๐ Methodology
Data Wrangling:
- Removed canceled orders and missing customer IDs
- Filtered out negative or zero Quantity and UnitPrice
- Parsed dates into datetime format
Feature Engineering:
- Built RFM features:
- Recency: Days since last purchase
- Frequency: Total purchases
- Monetary: Total amount spent
- Built RFM features:
Preprocessing:
- Log-transformed and scaled RFM values
- Applied PCA for 2D visualization
Clustering (K-Means):
- Determined optimal
k=3
using Elbow + Silhouette methods - Assigned customers to 3 clusters
- Determined optimal
Visualization:
- PCA scatter plot of clusters
- Boxplots and heatmap of RFM by cluster
- Choropleth map of customer countries
๐งฉ Segment Profiles
Cluster | Recency | Frequency | Monetary | Segment Type |
---|---|---|---|---|
0 | Low | High | High | ๐ Lapsed VIPs |
1 | High | Low | Low | ๐ง One-Time Buyers |
2 | Medium | Medium | Medium | โณ Mid-Tier Customers |
๐ง Business Insight
- Cluster 0: High spenders, but inactive โ win back with rewards or early access.
- Cluster 1: Likely one-time buyers โ nudge with follow-ups or discounts.
- Cluster 2: Moderate engagement โ develop into loyal customers with tailored campaigns.
๐ ๏ธ Tools
- Python (pandas, sklearn, seaborn, plotly)
- K-Means, PCA
- Jupyter Notebook