Skip to content

Conversation

irenedea
Copy link
Collaborator

@irenedea irenedea commented Aug 8, 2025

Dummy example showing how the torch async rl would work without Ray. We use native torch async operations to communicate between the rollout and the trainer processes.

on 4 gpu interactive:

python test_no_ray.py

synchronous output (MAX_ASYNC_STEPS=0)

root@eede4788-aeb0-4d28-a785-5033baf7646e-0:/compose-rl# python test_no_ray.py 
[TRAIN] 2025-08-08 06:07:03,647: rank0[181235][MainThread]: INFO: __main__: Initializing model update process group
[TRAIN] 2025-08-08 06:07:05,820: rank0[181235][MainThread]: INFO: __main__: Starting iteration 1/2
[ROLLOUT] 2025-08-08 06:07:06,156: rank0[180919][MainThread]: INFO: __main__: Starting iteration 1/2
[ROLLOUT] 2025-08-08 06:07:06,474: rank0[180919][MainThread]: INFO: __main__: Weights are ready to update
[ROLLOUT] 2025-08-08 06:07:06,474: rank0[180919][MainThread]: INFO: __main__: Updating the model weights
[TRAIN] 2025-08-08 06:07:06,475: rank0[181235][MainThread]: INFO: __main__: Broadcasted is_ready_to_update tensor([1], device='cuda:0')
[TRAIN] 2025-08-08 06:07:06,475: rank0[181235][MainThread]: INFO: __main__: Broadcasted model weights tensor([10], device='cuda:0')
[ROLLOUT] 2025-08-08 06:07:06,476: rank0[180919][MainThread]: INFO: __main__: Updating the weights to tensor([10], device='cuda:0')
[ROLLOUT] 2025-08-08 06:07:06,476: rank0[180919][MainThread]: INFO: __main__: Generating rollouts!
[ROLLOUT] 2025-08-08 06:07:06,476: rank0[180919][MainThread]: INFO: __main__: Sent experience buffer tensor([20])
[ROLLOUT] 2025-08-08 06:07:06,476: rank0[180919][MainThread]: INFO: __main__: Completed iteration 1/2
[ROLLOUT] 2025-08-08 06:07:06,476: rank0[180919][MainThread]: INFO: __main__: Starting iteration 2/2
[TRAIN] 2025-08-08 06:07:06,476: rank0[181235][MainThread]: INFO: __main__: Got experience buffer tensor([20])
[rank0]:[W808 06:07:06.693400115 ProcessGroupNCCL.cpp:4715] [PG ID 0 PG GUID 0 Rank 0]  using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
[TRAIN] 2025-08-08 06:07:06,750: rank0[181235][MainThread]: INFO: __main__: Training!
[TRAIN] 2025-08-08 06:07:26,750: rank0[181235][MainThread]: INFO: __main__: Completed iteration 1/2
[TRAIN] 2025-08-08 06:07:26,750: rank0[181235][MainThread]: INFO: __main__: Starting iteration 2/2
[TRAIN] 2025-08-08 06:07:26,751: rank0[181235][MainThread]: INFO: __main__: Broadcasted is_ready_to_update tensor([1], device='cuda:0')
[ROLLOUT] 2025-08-08 06:07:26,751: rank0[180919][MainThread]: INFO: __main__: Weights are ready to update
[ROLLOUT] 2025-08-08 06:07:26,751: rank0[180919][MainThread]: INFO: __main__: Updating the model weights
[TRAIN] 2025-08-08 06:07:26,751: rank0[181235][MainThread]: INFO: __main__: Broadcasted model weights tensor([11], device='cuda:0')
[ROLLOUT] 2025-08-08 06:07:26,753: rank0[180919][MainThread]: INFO: __main__: Updating the weights to tensor([11], device='cuda:0')
[ROLLOUT] 2025-08-08 06:07:26,753: rank0[180919][MainThread]: INFO: __main__: Generating rollouts!
[ROLLOUT] 2025-08-08 06:07:26,753: rank0[180919][MainThread]: INFO: __main__: Sent experience buffer tensor([21])
[ROLLOUT] 2025-08-08 06:07:26,753: rank0[180919][MainThread]: INFO: __main__: Completed iteration 2/2
[ROLLOUT] 2025-08-08 06:07:26,753: rank0[180919][MainThread]: INFO: __main__: Waiting for the last experience buffer to be received
[TRAIN] 2025-08-08 06:07:26,753: rank0[181235][MainThread]: INFO: __main__: Got experience buffer tensor([21])
[TRAIN] 2025-08-08 06:07:26,753: rank0[181235][MainThread]: INFO: __main__: Training!
[TRAIN] 2025-08-08 06:07:46,754: rank0[181235][MainThread]: INFO: __main__: Completed iteration 2/2
[rank0]:[W808 06:07:47.533397252 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Waiting up to 30 seconds for all training processes to terminate. Press Ctrl-C to exit immediately.

asynchronous output (MAX_ASYNC_STEPS=1)

root@eede4788-aeb0-4d28-a785-5033baf7646e-0:/compose-rl# python test_no_ray.py 
[TRAIN] 2025-08-08 06:08:37,387: rank0[182499][MainThread]: INFO: __main__: Initializing model update process group
[TRAIN] 2025-08-08 06:08:38,658: rank0[182499][MainThread]: INFO: __main__: Starting iteration 1/2
[ROLLOUT] 2025-08-08 06:08:38,991: rank0[182128][MainThread]: INFO: __main__: Starting iteration 1/2
[TRAIN] 2025-08-08 06:08:39,349: rank0[182499][MainThread]: INFO: __main__: Broadcasted is_ready_to_update tensor([1], device='cuda:0')
[TRAIN] 2025-08-08 06:08:39,349: rank0[182499][MainThread]: INFO: __main__: Broadcasted model weights tensor([10], device='cuda:0')
[ROLLOUT] 2025-08-08 06:08:39,349: rank0[182128][MainThread]: INFO: __main__: Weights are ready to update
[ROLLOUT] 2025-08-08 06:08:39,349: rank0[182128][MainThread]: INFO: __main__: Updating the model weights
[ROLLOUT] 2025-08-08 06:08:39,351: rank0[182128][MainThread]: INFO: __main__: Updating the weights to tensor([10], device='cuda:0')
[ROLLOUT] 2025-08-08 06:08:39,351: rank0[182128][MainThread]: INFO: __main__: Generating rollouts!
[ROLLOUT] 2025-08-08 06:08:39,351: rank0[182128][MainThread]: INFO: __main__: Sent experience buffer tensor([20])
[ROLLOUT] 2025-08-08 06:08:39,351: rank0[182128][MainThread]: INFO: __main__: Completed iteration 1/2
[ROLLOUT] 2025-08-08 06:08:39,351: rank0[182128][MainThread]: INFO: __main__: Starting iteration 2/2
[ROLLOUT] 2025-08-08 06:08:39,352: rank0[182128][MainThread]: INFO: __main__: Generating rollouts!
[ROLLOUT] 2025-08-08 06:08:39,352: rank0[182128][MainThread]: INFO: __main__: Sent experience buffer tensor([21])
[ROLLOUT] 2025-08-08 06:08:39,352: rank0[182128][MainThread]: INFO: __main__: Completed iteration 2/2
[TRAIN] 2025-08-08 06:08:39,352: rank0[182499][MainThread]: INFO: __main__: Got experience buffer tensor([20])
[ROLLOUT] 2025-08-08 06:08:39,352: rank0[182128][MainThread]: INFO: __main__: Waiting for the last experience buffer to be received
[rank0]:[W808 06:08:39.568937243 ProcessGroupNCCL.cpp:4715] [PG ID 0 PG GUID 0 Rank 0]  using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
[TRAIN] 2025-08-08 06:08:39,616: rank0[182499][MainThread]: INFO: __main__: Training!
[TRAIN] 2025-08-08 06:08:59,617: rank0[182499][MainThread]: INFO: __main__: Completed iteration 1/2
[TRAIN] 2025-08-08 06:08:59,617: rank0[182499][MainThread]: INFO: __main__: Starting iteration 2/2
[TRAIN] 2025-08-08 06:08:59,618: rank0[182499][MainThread]: INFO: __main__: Broadcasted is_ready_to_update tensor([1], device='cuda:0')
[TRAIN] 2025-08-08 06:08:59,618: rank0[182499][MainThread]: INFO: __main__: Broadcasted model weights tensor([11], device='cuda:0')
[TRAIN] 2025-08-08 06:08:59,618: rank0[182499][MainThread]: INFO: __main__: Got experience buffer tensor([21])
[TRAIN] 2025-08-08 06:08:59,619: rank0[182499][MainThread]: INFO: __main__: Training!
[TRAIN] 2025-08-08 06:09:19,619: rank0[182499][MainThread]: INFO: __main__: Completed iteration 2/2
[rank0]:[W808 06:09:20.501714194 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Waiting up to 30 seconds for all training processes to terminate. Press Ctrl-C to exit immediately.

@irenedea irenedea changed the title Irene/no ray No ray, native toch async Aug 8, 2025
@irenedea irenedea changed the title No ray, native toch async No ray, native torch async rl prototype Aug 8, 2025
@irenedea irenedea requested a review from rithwik-db August 8, 2025 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant