Adds unit tests which run on GPU for open-instruct. #905

finbarrtimbers · 2025-08-18T15:08:38Z

Pulls out all the tests which would be skipped if CUDA is not available into separate files.

Changes pyproject.toml so that now when we run pytest it only looks at test files that don't end in _gpu.py.

Fixes some of the tests so they pass. Currently, takes ~7m to run the tests, and tests run on the cheap GPUs (L40/A6000).

…l dataset size.

…l params.

mnoukhov

Is this going to be a GPU job for every single updated main? I can't tell but that feels like it would be a substantial number of jobs on the cluster :/

mnoukhov · 2025-08-18T22:48:56Z

open_instruct/test_grpo_fast_gpu.py

+        tokenizer_name = "EleutherAI/pythia-14m"
+        tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
+        test_prompt = "What is the capital of France?"
+        prompt_token_ids = tokenizer.encode(test_prompt, return_tensors="pt").tolist()[0]
+        param_prompt_Q = ray_queue.Queue(maxsize=1)
+        inference_results_Q = ray_queue.Queue(maxsize=1)
+        actor_manager = vllm_utils3.ActorManager.remote()
+        vllm_engines = create_vllm_engines(
+            num_engines=1,
+            tensor_parallel_size=1,
+            enforce_eager=True,
+            tokenizer_name_or_path=tokenizer_name,
+            pretrain=tokenizer_name,
+            revision="main",
+            seed=42,
+            enable_prefix_caching=False,
+            max_model_len=512,
+            vllm_gpu_memory_utilization=0.5,
+            prompt_queue=param_prompt_Q,
+            results_queue=inference_results_Q,
+            actor_manager=actor_manager,
+        )


not sure but feels like the tokenizer and vllm engine could be in the setup of the test

mnoukhov · 2025-08-19T18:40:35Z

open_instruct/test_grpo_fast_gpu.py

+            actor_manager=actor_manager,
+        )
+
+        generation_config = SamplingParams(temperature=0.0, top_p=1.0, max_tokens=5, n=1)


do you think its possible to get a deterministic output for Pythia 14m and specifically test for the expected completion for this prompt? Not sure how important that is

finbarrtimbers · 2025-08-20T18:33:19Z

Is this going to be a GPU job for every single updated main? I can't tell but that feels like it would be a substantial number of jobs on the cluster :/

My attitude is to enable it and see if we get complaints.

I could also make the trigger a lot more restrictive and have it only trigger on changes to grpo_fast or vllm_utils3.py.

mnoukhov

triggering on changes to grpo_fast or vllm3_utils makes sense to me
still feels like it could be overkill but I'm inclined to say let's try and then rollback if necessary

mnoukhov · 2025-08-20T23:56:41Z

I accidentally clicked approve instead of request changes but I'll leave it as is. Swap to changes on those files only and merge!

finbarrtimbers added 30 commits July 29, 2025 12:26

Cleaned up evals to have same names as training data.

b133e2a

Refactored evals to use a batch.

1265789

Now, we accumulate eval results.

aa76132

Merge branch 'main' into fix-eval

4e30841

Updated scripts so they run.

66af972

More refactoring.

cfa55c9

Now, use the minimum of the number of requested samples and the actua…

433242a

…l dataset size.

Ran linter, and fixed extra arg issue.

0836fca

Always insert into pending_queries_map.

8028a31

Update signature in eval.

9816f34

Merge branch 'main' into fix-eval

97b8de9

Another attempted fix.

e862a14

Ran linter.

9676db3

Now, eval requests use the eval params, and normal ones use the norma…

d044278

…l params.

Now, tests should pass.

6a694bf

Merge branch 'main' into fix-eval

f45b951

Remove simple config and pass generation_config through.

96df985

Now, generation config is passed through.

b931a35

Ran linter.

aa0facb

Ran linter.

9dd0711

Added a while loop.

cbf7aa7

Added a while loop with retries.

84b9a4c

Merge branch 'main' into fix-eval

93c0a97

Added logs.

87aa0fa

Fix queue issue.

b636127

Add progress bars to all ray.get calls.

d0f8870

Merge branch 'main' into fix-eval

9f9e644

Cleaned up some of the logging.

08de6ea

Changed how we handle full queues.

634e1fb

Ran linter.

ada6556

Updated workflow.

efd6279

finbarrtimbers force-pushed the gpu-tests branch from 3071602 to efd6279 Compare August 18, 2025 15:24

finbarrtimbers added 14 commits August 18, 2025 09:33

update image name.

791166d

Fixed GPU tests (hopefully).

951d218

Moved to use smaller GPUs.

075b847

Fixed GPU tests.

d167db3

Updated timeout.

d296bf4

ANother attempt to fix tests.

f7e2af0

Merge branch 'main' into gpu-tests

3b87be0

Cleaned up PR.

ee5a927

Cleaned up PR.

5dc1a29

Merge branch 'main' into gpu-tests

b072843

Cleaned up PR...

dcd555b

Cleaned up PR.

1cab101

Merge branch 'main' into gpu-tests

82ec769

Linter passes.

8297c9e

finbarrtimbers requested a review from hamishivi August 18, 2025 19:01

finbarrtimbers marked this pull request as ready for review August 18, 2025 19:01

finbarrtimbers requested review from mnoukhov and removed request for hamishivi August 18, 2025 19:01

finbarrtimbers enabled auto-merge August 18, 2025 19:09

mnoukhov reviewed Aug 19, 2025

View reviewed changes

mnoukhov approved these changes Aug 20, 2025

View reviewed changes

finbarrtimbers added this pull request to the merge queue Aug 20, 2025

mnoukhov removed this pull request from the merge queue due to a manual request Aug 20, 2025

finbarrtimbers added 3 commits August 22, 2025 16:50

Merge branch 'main' into gpu-tests

ead5ba6

Now use gpu runner.

22365a0

Updated code to trigger tests.

810c948

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adds unit tests which run on GPU for open-instruct. #905

Adds unit tests which run on GPU for open-instruct. #905

Uh oh!

finbarrtimbers commented Aug 18, 2025 •

edited

Loading

Uh oh!

mnoukhov left a comment

Uh oh!

mnoukhov Aug 18, 2025

Uh oh!

mnoukhov Aug 19, 2025

Uh oh!

finbarrtimbers commented Aug 20, 2025

Uh oh!

mnoukhov left a comment

Uh oh!

mnoukhov commented Aug 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Adds unit tests which run on GPU for open-instruct. #905

Are you sure you want to change the base?

Adds unit tests which run on GPU for open-instruct. #905

Uh oh!

Conversation

finbarrtimbers commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mnoukhov left a comment

Choose a reason for hiding this comment

Uh oh!

mnoukhov Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

mnoukhov Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

finbarrtimbers commented Aug 20, 2025

Uh oh!

mnoukhov left a comment

Choose a reason for hiding this comment

Uh oh!

mnoukhov commented Aug 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

finbarrtimbers commented Aug 18, 2025 •

edited

Loading