Skip to content
This repository was archived by the owner on Mar 13, 2024. It is now read-only.

JuliaReinforcementLearning/DistributedReinforcementLearning.jl-archive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
Jeremiah Lewis
Mar 13, 2024
488b6ac Β· Mar 13, 2024
Mar 13, 2024
Mar 13, 2024
Mar 13, 2024
Mar 13, 2024
Mar 13, 2024
Mar 13, 2024
Mar 13, 2024
Mar 13, 2024

Repository files navigation

If it works, it works everywhere!


Design

Components

  • πŸ‘· Worker, a worker creates a task to run an experiment in background. It periodically sends out transitions between agent and environment, and fetches latest parameter.
  • πŸ“’ WorkerProxy, a worker proxy collects messages from/to workers on the same node so that some message data (model params) can be shared across different workers.
  • πŸ’Ώ TrajectoryManager, a trajectory manager is a wrapper around an AbstractTrajectory. It takes in a bulk of transitions and samples a batch of training data in respond to request.
  • πŸ’‘ Trainer, a trainer is a wrapper around an AbstractPolicy, it does nothing but to update its internal parameters when received a batch of training data and periodically broadcast its latest parameters.
  • ⏱️ Orchestrator, an orchestrator is in charge of controlling the start, stop and the speed of communications between the above components.

Note that:

  1. We adopt the actor model here. Each instance of the above components is an actor. Only messages are passing between them.
  2. A node is a process in Julia. Different nodes can be on one machine or across different machines.
  3. Tasks in different workers are initiated with Threads.@spawn. There's no direct communication between them by design.
  4. In single node environment (WorkerNode and MainNode are the same one), the WorkerProxy can be removed and workers communicate with Orchestrator directly.

Messages

  • 1️⃣ (πŸ‘· β†’ πŸ“’) InsertTransitionMsg, contains the local transitions between agent and environment in an experiment.
  • 2️⃣ (πŸ“’ β†’ ⏱️) InsertTransitionMsg from different workers.
  • 3️⃣ (⏱️ β†’ πŸ’Ώ) InsertTransitionMsg and SampleBatchMsg (which contains the address of Trainer).
  • 4️⃣ (πŸ’Ώ β†’ πŸ’‘) BatchTrainingDataMsg
  • 5️⃣ (πŸ’‘ β†’ πŸ’Ώ) UpdatePriorityMsg, only necessary in prioritized experience replay related algorithms.
  • 6️⃣ (πŸ’‘ β†’ ⏱️) LoadParamsMsg, contains the latest parameters of the policy.
  • 7️⃣ (⏱️ β†’ πŸ“’) LoadParamsMsg
  • 8️⃣ (πŸ“’ β†’ πŸ‘·) LoadParamsMsg

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages