SVL:
Goal-Conditioned Reinforcement Learning as Survival Learning

ArXiv Webpage

Survival Value Learning (SVL) is a probabilistic alternative to TD-based goal-conditioned reinforcement learning. SVL reframes GCRL as a survival learning problem by modeling the time-to-goal from each state as a probability distribution, and derives a closed-form identity that expresses the goal-conditioned value function as a discounted sum of survival probabilities. Values are estimated via a hazard model trained by maximum likelihood on both event and right-censored trajectories, with three practical estimators (finite-horizon truncation and two binned infinite-horizon approximations) to capture long-horizon objectives.

This repository implements HSVL, the hierarchical variant evaluated in the paper: SVL value estimation paired with an AWR-style hierarchical actor (high-level subgoal proposer + low-level controller). HSVL matches or surpasses strong hierarchical TD and Monte Carlo baselines on the long-horizon locomotion and navigation tasks from OGBench.

Installation

We manage the project with uv. From the repo root:

uv sync

This creates .venv/ and installs all dependencies pinned in uv.lock (including the right JAX + CUDA wheels on Linux).

Usage

Configuration is managed with Hydra. The entry point is train.py at the repo root, and all knobs can be overridden from the command line.

# Default run (antmaze-large-navigate)
uv run train.py

# Override the environment
uv run train.py env_name=humanoidmaze-large-navigate-v0 \
    agent.subgoal_steps=100 \
    agent.num_log_bins=500

# humanoidmaze-giant uses a higher discount
uv run train.py env_name=humanoidmaze-giant-navigate-v0 \
    agent.discount=0.999 \
    agent.subgoal_steps=100 \
    agent.num_log_bins=500

Multiple seeds / sweeps

Use Hydra multi-run (-m) to dispatch several runs:

# Two seeds
uv run train.py -m env_name=humanoidmaze-giant-navigate-v0 \
    agent.discount=0.999 \
    agent.subgoal_steps=100 \
    agent.num_log_bins=500 \
    seed=0,1

With the submitit launcher (configured via hydra/launcher), each combination is submitted as a separate SLURM job. Locally, runs are executed sequentially.

Citing SVL

@misc{tiofack2026svlgoalconditionedreinforcementlearning,
    title={SVL: Goal-Conditioned Reinforcement Learning as Survival Learning},
    author={Franki Nguimatsia Tiofack and Fabian Schramm and Théotime Le Hellard and Justin Carpentier},
    year={2026},
    eprint={2604.17551},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={https://arxiv.org/abs/2604.17551},
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
hsvl		hsvl
.gitignore		.gitignore
README.md		README.md
intro.png		intro.png
pyproject.toml		pyproject.toml
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SVL:
Goal-Conditioned Reinforcement Learning as Survival Learning

ArXiv Webpage

Installation

Usage

Multiple seeds / sweeps

Citing SVL

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SVL: Goal-Conditioned Reinforcement Learning as Survival Learning

ArXiv Webpage

Installation

Usage

Multiple seeds / sweeps

Citing SVL

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

SVL:
Goal-Conditioned Reinforcement Learning as Survival Learning