feat(train): add validation metrics and distributed support #2574

ha405 · 2025-08-22T06:18:18Z

This PR extends the validation metrics functionality (precision, recall, F1-score) to the train.py script.

The validate function within train.py now supports the --metrics-avg flag.
Implemented torch.distributed.all_gather to correctly collect predictions and targets from all GPUs before calculating metrics on the primary process.
The feature remains a soft dependency on scikit-learn and is disabled by default.

This ensures that users can get these more detailed metrics during training, even in a multi-GPU environment.

feat(train): add validation metrics and distributed support

0c04fa1

ha405 mentioned this pull request Aug 22, 2025

feat(validate): add precision, recall, and F1 metrics #2568

Open

Provide feedback