This project fine-tunes Bio_Clinical BERT for multi-class classification of clinical notes using transformer-based deep learning.
- Full pipeline: JSON data loading, preprocessing, train–validation stratified split, and label encoding
- Custom PyTorch
Dataset
class for integration with HuggingFace Trainer - Fine-tunes pretrained Bio_ClinicalBERT for 22 clinical note categories
- Model evaluation with classification report, macro F1/accuracy, and annotated confusion matrix
- Visualization of per-class distribution and prediction results
- Python, pandas, numpy
- PyTorch, HuggingFace Transformers, Datasets
- scikit-learn (evaluation, splitting)
- matplotlib, seaborn