FlorDB brings experiment tracking, provenance, and reproducibility to your ML workflow—using the one thing every engineer already writes: logs.
Unlike heavyweight MLOps platforms, FlorDB doesn’t ask you to adopt a new UI, schema, or service. Just import it, log as you normally would, and gain full history, lineage, and replay capabilities across your training runs.
-
Log-Driven Experiment Tracking
No dashboards to configure or schemas to design. FlorDB turns your existingprint()orlog()calls into structured, queryable metadata. -
Hindsight Logging & Replay
Missed a metric? Add a log after the fact and replay past runs to capture it—no rerunning from scratch. -
Reproducibility Without Friction
Every run is versioned via Git, every hyperparameter is recorded, and every model checkpoint is linked and queryable—automatically. -
Works With Your Stack
Makefiles, Airflow, Slurm, HuggingFace, PyTorch—you don’t change your workflow. FlorDB fits in.
pip install flordbFor contributors or bleeding-edge features:
git clone https://github.com/ucbrise/flor.git
cd flor
pip install -e .Requires a Git repository for automatic versioning.
mkdir flor_sandbox
cd flor_sandbox
git init
ipythonimport flordb as flor
flor.log("message", "Hello ML World!")message: Hello, ML World!
Changes committed successfully
Retrieve logs anytime:
flor.dataframe("message") projid tstamp filename message
0 flor_sandbox 2025-10-13 18:13:48 ipython Hello ML World!
Drop FlorDB into your existing training script:
import flordb as flor
# Hyperparameters
lr = flor.arg("lr", 1e-3)
batch_size = flor.arg("batch_size", 32)
with flor.checkpointing(model=net, optimizer=optimizer):
for epoch in flor.loop("epoch", range(epochs)):
for x, y in flor.loop("step", trainloader):
...
flor.log("loss", loss.item())Change hyperparameters from the CLI:
python train.py --kwargs lr=5e-4 batch_size=64View metrics across runs:
flor.dataframe("lr", "batch_size", "loss") projid tstamp filename epoch step lr batch_size loss
0 ml_tutorial 2025-10-13 18:18:14 train.py 1 500 0.0005 64 0.20570574700832367
1 ml_tutorial 2025-10-13 18:18:14 train.py 2 500 0.0005 64 0.1964433193206787
2 ml_tutorial 2025-10-13 18:18:14 train.py 3 500 0.0005 64 0.11040152609348297
3 ml_tutorial 2025-10-13 18:18:14 train.py 4 500 0.0005 64 0.155434250831604
4 ml_tutorial 2025-10-13 18:18:14 train.py 5 500 0.0005 64 0.0741351768374443
Forgot to log gradient norms?
flor.log("grad_norm", ...)Just add the logging statement to the script and run:
python -m flordb replay grad_normFlorDB replays only what’s needed, injecting the new log across copies of historical versions and committing results.
FlorDB powers full AI/ML lifecycle tooling:
- Feature Stores & Model Registries
- Document Parsing & Feedback Loops
- Continuous Training Pipelines
See our Scan Studio and Document Parser examples for real-world integration.
FlorDB is based on research from UC Berkeley’s RISE Lab and Arizona State University.
- Flow with FlorDB: Incremental Context Maintenance for the Machine Learning Lifecycle (CIDR 2025)
- The Management of Context in the ML Lifecycle (UCB Tech Report 2024)
- Hindsight Logging for Model Training (PVLDB 2021)
Apache v2 License — free to use, modify, and distribute.
FlorDB is actively developed. Contributions, issues, and real-world use cases are welcome!
GitHub: https://github.com/ucbrise/flor
Tutorial Video: https://youtu.be/mKENSkk3S4Y