Skip to content

Commit 212138a

Browse files
committed
Add experiment tracking abstraction layer
Introduces a unified tracking interface supporting multiple backends (Weights & Biases, MLFlow, TensorBoard, CSV) for experiment logging during training. Key features include: - Abstract Tracker base class with standardized API for logging metrics, tables, and completions - CompositeTracker for simultaneous logging to multiple backends - NullTracker for disabling tracking without code changes - Integration with GRPOTrainer via optional trackers parameter - Comprehensive test coverage with 32 test cases - Instance-level logging following codebase patterns This provides researchers flexibility to switch between tracking backends and enables local development with CSV logging alongside production tracking with W&B or MLFlow.
1 parent f510550 commit 212138a

File tree

12 files changed

+1281
-31
lines changed

12 files changed

+1281
-31
lines changed

docs/source/components.md

Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -595,8 +595,190 @@ def load_math_suite(**kwargs):
595595
)
596596
```
597597

598+
## Tracking
599+
600+
The tracking module provides a unified interface for experiment tracking across different backends. This allows you to switch between tracking systems or use multiple simultaneously without changing your training code.
601+
602+
### Available Trackers
603+
604+
#### WandbTracker
605+
606+
Track experiments using Weights & Biases:
607+
608+
```python
609+
from verifiers.tracking import WandbTracker
610+
611+
tracker = WandbTracker(
612+
project="my-project",
613+
name="experiment-1",
614+
entity="my-team",
615+
tags=["baseline", "grpo"]
616+
)
617+
tracker.init()
618+
tracker.log_metrics({"accuracy": 0.95, "loss": 0.05}, step=1)
619+
tracker.finish()
620+
```
621+
622+
#### CSVTracker
623+
624+
Track experiments locally using CSV files:
625+
626+
```python
627+
from verifiers.tracking import CSVTracker
628+
629+
tracker = CSVTracker(
630+
log_dir="./experiment_logs",
631+
project="my-project",
632+
name="experiment-1"
633+
)
634+
tracker.init() # Creates log directory and config.json
635+
tracker.log_metrics({"accuracy": 0.95}, step=1)
636+
tracker.log_table("completions", {
637+
"prompt": ["p1", "p2"],
638+
"completion": ["c1", "c2"],
639+
"reward": [0.9, 0.8]
640+
})
641+
```
642+
643+
#### CompositeTracker
644+
645+
Use multiple trackers simultaneously:
646+
647+
```python
648+
from verifiers.tracking import CompositeTracker, WandbTracker, CSVTracker
649+
650+
tracker = CompositeTracker([
651+
WandbTracker(project="my-project"),
652+
CSVTracker(log_dir="./logs")
653+
])
654+
# All operations are forwarded to both trackers
655+
tracker.init()
656+
tracker.log_metrics({"loss": 0.05}, step=1)
657+
```
658+
659+
#### NullTracker
660+
661+
No-op tracker for testing or when tracking is disabled:
662+
663+
```python
664+
from verifiers.tracking import NullTracker
665+
666+
tracker = NullTracker() # Does nothing
667+
tracker.log_metrics({"loss": 0.05}, step=1) # No-op
668+
```
669+
670+
#### MLFlowTracker
671+
672+
Track experiments using MLFlow:
673+
674+
```python
675+
from verifiers.tracking import MLFlowTracker
676+
677+
tracker = MLFlowTracker(
678+
experiment_name="my-experiment",
679+
run_name="run-1",
680+
tracking_uri="http://localhost:5000", # Optional
681+
tags={"env": "production"}
682+
)
683+
tracker.init()
684+
tracker.log_metrics({"accuracy": 0.95, "loss": 0.05}, step=1)
685+
tracker.log_config({"learning_rate": 0.001, "batch_size": 32})
686+
tracker.finish()
687+
```
688+
689+
#### TensorBoardTracker
690+
691+
Track experiments using TensorBoard:
692+
693+
```python
694+
from verifiers.tracking import TensorBoardTracker
695+
696+
tracker = TensorBoardTracker(
697+
log_dir="./runs",
698+
comment="grpo-experiment"
699+
)
700+
tracker.init()
701+
tracker.log_metrics({"accuracy": 0.95, "loss": 0.05}, step=1)
702+
tracker.log_config({"learning_rate": 0.001})
703+
tracker.finish()
704+
```
705+
706+
### Custom Trackers
707+
708+
Create your own tracker by extending the base `Tracker` class:
709+
710+
```python
711+
from verifiers.tracking import Tracker
712+
from typing import Any, Optional
713+
714+
class MyCustomTracker(Tracker):
715+
def __init__(self, endpoint: str, **kwargs):
716+
super().__init__(**kwargs)
717+
self.endpoint = endpoint
718+
719+
def init(self, **kwargs) -> None:
720+
# Initialize your tracking backend
721+
self._initialized = True
722+
723+
def log_metrics(self, metrics: dict[str, float],
724+
step: Optional[int] = None, **kwargs) -> None:
725+
# Send metrics to your backend
726+
requests.post(f"{self.endpoint}/metrics", json=metrics)
727+
728+
def log_table(self, table_name: str, data: dict[str, list[Any]],
729+
step: Optional[int] = None, **kwargs) -> None:
730+
# Send table data to your backend
731+
requests.post(f"{self.endpoint}/tables/{table_name}", json=data)
732+
733+
def log_completions(self, prompts: list[str], completions: list[str],
734+
rewards: list[float], step: Optional[int] = None,
735+
**kwargs) -> None:
736+
# Log completion samples
737+
data = {"prompts": prompts, "completions": completions, "rewards": rewards}
738+
self.log_table("completions", data, step=step)
739+
740+
def log_config(self, config: dict[str, Any], **kwargs) -> None:
741+
super().log_config(config)
742+
requests.post(f"{self.endpoint}/config", json=config)
743+
744+
def finish(self, **kwargs) -> None:
745+
# Clean up resources
746+
pass
747+
```
748+
749+
### Integration with GRPOTrainer
750+
751+
The `GRPOTrainer` accepts a `trackers` parameter:
752+
753+
```python
754+
import verifiers as vf
755+
756+
# Option 1: Explicit trackers
757+
tracker = vf.CSVTracker(log_dir="./logs")
758+
trainer = vf.GRPOTrainer(
759+
model=model,
760+
env=env,
761+
args=args,
762+
processing_class=tokenizer,
763+
trackers=[tracker]
764+
)
765+
766+
# Option 2: Auto-detection from args.report_to
767+
args.report_to = "wandb" # Automatically uses WandbTracker
768+
trainer = vf.GRPOTrainer(model=model, env=env, args=args,
769+
processing_class=tokenizer)
770+
```
771+
598772
## Best Practices
599773

774+
### For Tracking
775+
- Use `CSVTracker` for local development and debugging
776+
- Use `WandbTracker` for team collaboration and cloud storage
777+
- Use `MLFlowTracker` when using MLFlow for experiment management
778+
- Use `TensorBoardTracker` for real-time metric visualization
779+
- Use `CompositeTracker` to log to multiple backends simultaneously
780+
- Create custom trackers for integration with internal tooling
781+
600782
### For Rubrics
601783
- Start simple with basic reward functions
602784
- Use JudgeRubric when rule-based evaluation is insufficient

docs/source/training.md

Lines changed: 48 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -276,13 +276,60 @@ args.ds3_gather_for_generation = False # For very large models
276276
args.generation_batch_size = 16 # Control generation batch size
277277
```
278278

279+
### Experiment Tracking
280+
281+
Verifiers supports multiple experiment tracking backends through a unified abstraction layer:
282+
283+
```python
284+
from verifiers.tracking import WandbTracker, CSVTracker, CompositeTracker
285+
286+
# Option 1: Use WandbTracker explicitly
287+
tracker = WandbTracker(project="my-project", name="run-1")
288+
trainer = vf.GRPOTrainer(
289+
model=model,
290+
processing_class=tokenizer,
291+
env=env,
292+
args=args,
293+
trackers=[tracker]
294+
)
295+
296+
# Option 2: Use CSVTracker for local development
297+
tracker = CSVTracker(log_dir="./logs", project="my-project")
298+
trainer = vf.GRPOTrainer(..., trackers=[tracker])
299+
300+
# Option 3: Use multiple trackers simultaneously
301+
trackers = [
302+
WandbTracker(project="my-project"),
303+
CSVTracker(log_dir="./logs")
304+
]
305+
trainer = vf.GRPOTrainer(..., trackers=trackers)
306+
307+
# Option 4: Let trainer auto-detect from args.report_to
308+
# When trackers=None, trainer uses WandbTracker if "wandb" in args.report_to
309+
args.report_to = "wandb"
310+
trainer = vf.GRPOTrainer(...) # Automatically uses WandbTracker
311+
```
312+
313+
**Available Trackers:**
314+
- `WandbTracker`: Logs to Weights & Biases (requires `wandb` login)
315+
- `CSVTracker`: Logs metrics and tables to local CSV files
316+
- `MLFlowTracker`: Logs to MLFlow tracking server
317+
- `TensorBoardTracker`: Logs to TensorBoard event files
318+
- `CompositeTracker`: Combines multiple trackers
319+
- `NullTracker`: No-op tracker for testing
320+
321+
**CSVTracker Output:**
322+
- `logs/metrics.csv`: Training metrics over time
323+
- `logs/completions.csv`: Sample completions and rewards
324+
- `logs/eval_completions.csv`: Evaluation samples
325+
- `logs/config.json`: Training configuration
326+
279327
### Monitoring
280328

281329
```python
282330
# Logging configuration
283331
args.logging_steps = 1
284332
args.log_completions = True
285-
args.report_to = "wandb" # or "none" to disable
286333
args.num_completions_to_print = 5 # Sample size to log
287334
```
288335

0 commit comments

Comments
 (0)