This project classifies wine varieties based on country and description using a fine-tuned ModernBERT model. The system handles class imbalance through weighted loss and employs careful dataset stratification.
Try it in action here: https://huggingface.co/spaces/spawn99/wine-variety
- 🍷 Trained on combined country + description
- ⚖️ Class imbalance handling with weighted loss
- 🧪 Stratified dataset splitting with rare class filtering
- 📊 Evaluation with per-class F1 scores and confusion matrix
- 🤗 Hugging Face Hub integration for dataset and model
- Clone the repository:
git clone https://github.com/cavit99/wine-classification.git
cd wine-classification- Install dependencies:
pip install -r requirements.txt- Set up WandB for experiment tracking (optional but recommended):
wandb loginThe wine reviews dataset is hosted on Hugging Face Hub:
from datasets import load_dataset
dataset = load_dataset("spawn99/wine-reviews")- text: Combined country and description (separated by [SEP])
- variety: Wine variety label (normalized and filtered)
python preprocess_dataset.pypython train.pyEvaluation metrics and confusion matrix are automatically generated after training.
Architecture: answerdotai/ModernBERT-base
Training:
- Batch size: 128
- Learning rate: 5e-5 (cosine scheduler)
- Early stopping: 3 epochs patience
- Class-weighted cross-entropy loss
Performance:
- Weighted F1 score
- Accuracy
- Per-class metrics for rare varieties
-
Class Normalization:
- Blend varieties normalized to "[Grape] Blend"
- Rare varieties filtered (<50 samples)
-
Stratified Splitting:
- 70% train / 10% validation / 20% test
- Ensures all splits contain all classes
See requirements.txt for full dependency list.
MIT License - see LICENSE for details.