OpenEnv-compliant reinforcement learning environment for supply chain optimization with a built-in model visibility dashboard powered by Streamlit + SQLite.
InventOps simulates a inventory management problem where an LLM agent
must issue sequential order / transfer / hold decisions to minimize stockouts,
holding costs, and capacity breaches across a planning horizon.
Stack: Python 3.11 · Pydantic v2 · NumPy · Groq API · Streamlit · Plotly · SQLite
Tasks: easy / medium / hard
Reward: Dense shaped (fulfillment − holding − stockout − capacity penalties)
InventOps/
├── InventOps/ # Core RL environment (env, models, reward, simulator)
├── rlvr/ # RLVR loop — GroqAgent + PromptOptimizer
│ └── prompts/ # Base & optimised prompt text files
├── metrics/ # SQLite metric logger (auto-created on first run)
│ └── logger.py # MetricLogger — thread-safe, no-op-capable
├── dashboard/ # Streamlit visibility dashboard
│ ├── app.py # 4-tab UI (Overview · Episodes · RLVR · Inference)
│ └── queries.py # SQL → pandas query helpers
├── inference.py # HF/OpenEnv submission entry-point
├── evaluate.py # Multi-agent benchmark (hold / random / LLM)
├── server.py # FastAPI action server
├── Dockerfile # Main inference image
├── Dockerfile.dashboard# Lightweight dashboard image
└── docker-compose.yml # Full stack (inference + dashboard, shared DB volume)
# Recommended: uv (fast)
uv sync
# Or pip
pip install -r requirements.txt# Hold-only + random baselines, 10 seeds
python evaluate.py --seeds 10Include Groq LLM agent:
GROQ_API_KEY=gsk_... python evaluate.py --seeds 10 --llmHF_TOKEN=gsk_... python inference.pySelf-test without API key:
python inference.py --testGROQ_API_KEY=gsk_... python rlvr/prompt_optimizer.py --task medium --rounds 4streamlit run dashboard/app.pyAll three entry-points (inference.py, evaluate.py, rlvr/prompt_optimizer.py)
automatically write structured metrics to metrics/inventops.db (SQLite).
The Streamlit dashboard reads from this file and provides four tabs:
| Tab | What you see |
|---|---|
| 🏠 Overview | KPI cards · Mean score by task & agent · Recent runs table |
| 📈 Episodes | Reward-per-step curves · Action distribution pie · Reward components |
| 🔁 RLVR Loop | Score progression per prompt round · min/max band · Failure type breakdown |
| ⚡ Inference | LLM latency histogram · Parse error rate · Raw step log |
# Copy .env.example → .env and fill in keys
docker compose up --buildServices:
inventops→ http://localhost:8080 (FastAPI action server)dashboard→ http://localhost:8501 (Streamlit dashboard)
The two containers share a named Docker volume (metrics_data) so the dashboard
updates live as inference runs write step data.
docker build -f Dockerfile.dashboard -t inventops-dashboard .
docker run -p 8501:8501 \
-v $(pwd)/metrics:/app/metrics \
inventops-dashboard| Variable | Description | Default |
|---|---|---|
HF_TOKEN |
Groq / HF / OpenRouter API key | — |
GROQ_API_KEY |
Groq key (used by rlvr/ and evaluate --llm) | — |
API_BASE_URL |
LLM endpoint | https://api.groq.com/openai/v1 |
MODEL_NAME |
Model identifier | llama-3.1-8b-instant |
INVENTOPS_DB |
Path to SQLite metrics database | metrics/inventops.db |
=======================================================================
InventOps — Benchmark Evaluation (20 seeds per task)
=======================================================================
Task hold-only random groq-llm
mean ± std mean ± std mean ± std
-------------------------------------------------------------------------
easy 0.412 ± 0.091 0.389 ± 0.103 0.631 ± 0.072
medium 0.388 ± 0.087 0.401 ± 0.098 0.584 ± 0.081
hard 0.341 ± 0.094 0.362 ± 0.110 0.547 ± 0.089
-------------------------------------------------------------------------
composite 0.380 0.384 0.587
=======================================================================
Apache License 2.0