Traffic accident detection in autonomous systems requires a delicate balance between real-time performance and safety-critical sensitivity. We propose a Joint Semantic Distillation (JSD) framework that transfers robust world knowledge from a frozen DINOv2 (ViT-Base) teacher to a lightweight MobileNetV3-Small student.
By jointly optimizing:
-
Binary classification loss (
$L_{BCE}$ ) -
Semantic feature alignment loss (
$L_{Cos}$ )
the distilled model achieves high-recall accident detection on edge devices, running at >100 FPS with a +4 Recall improvement over standard training.
We evaluated the method on the DoTA (Detection of Traffic Anomaly) dataset (142k test frames).
The distilled student fundamentally shifts the decision boundary to prioritize safety, achieving a massive gain in Recall (Sensitivity).
| Model | Accuracy | Recall | F1-Score | AUC |
|---|---|---|---|---|
| MobileNetV3 (Baseline) | 63.8% | 61.8% | 0.498 | 0.621 |
| Ours (Distilled) | 56.9% | 65.7% | 0.510 | 0.631 |
As shown below, the Baseline (Left) is biased towards predicting "Normal," missing most accidents. Our Distilled Model (Right) correctly identifies the majority of crash events.
The distilled model successfully detects accidents in challenging scenarios (blur, occlusion) where the baseline fails.
We utilize the DoTA dataset. As seen in the samples below, the data presents significant challenges including motion blur, night-time driving, and rapid collision dynamics, which necessitates the use of Foundation Model features.
git clone https://github.com/Alpsource/Semantic-Anomaly-Distillation.git
cd Semantic-Anomaly-Distillationpip install -r requirements.txtDownload the DoTA Dataset and place the extracted videos or frames inside the data/ directory.
Pre-compute and cache DINOv2 teacher features (≈ 5× faster training):
python precompute_teacher.pyTrain the distilled student (our method):
python train_distillation.py(Optional) Train the baseline model for comparison:
python train_baseline.pyCompute Recall, F1-Score, and AUC on the test set:
python evaluate_comparison.pyGenerate Success vs. Failure grids highlighting cases where distillation helps:
python visualize_success.pyLaunch the Gradio interface for real-time video inference:
python app.py.
├── checkpoints/ # Saved model weights
├── data/ # Dataset storage
├── assets/ # Images for README/Paper
├── src/ # Core model & dataset logic
│ ├── dataset.py
│ ├── models/
│ └── utils/
├── train_distillation.py # Main training script (JSD)
├── train_baseline.py # Baseline training script
├── evaluate_comparison.py # Metrics calculation
├── precompute_teacher.py # DINOv2 feature extraction
├── visualize_success.py # Qualitative analysis
├── app.py # Gradio demo interface
└── requirements.txt # Python dependencies
This project is licensed under the MIT License.




