Skip to content

Conversation

@finbarrtimbers
Copy link
Collaborator

@finbarrtimbers finbarrtimbers commented Aug 18, 2025

Pulls out all the tests which would be skipped if CUDA is not available into separate files.

Changes pyproject.toml so that now when we run pytest it only looks at test files that don't end in _gpu.py.

Fixes some of the tests so they pass. Currently, takes ~7m to run the tests, and tests run on the cheap GPUs (L40/A6000).

@finbarrtimbers finbarrtimbers marked this pull request as ready for review August 18, 2025 19:01
@finbarrtimbers finbarrtimbers requested review from mnoukhov and removed request for hamishivi August 18, 2025 19:01
Copy link
Contributor

@mnoukhov mnoukhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to be a GPU job for every single updated main? I can't tell but that feels like it would be a substantial number of jobs on the cluster :/

Comment on lines +36 to +57
tokenizer_name = "EleutherAI/pythia-14m"
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
test_prompt = "What is the capital of France?"
prompt_token_ids = tokenizer.encode(test_prompt, return_tensors="pt").tolist()[0]
param_prompt_Q = ray_queue.Queue(maxsize=1)
inference_results_Q = ray_queue.Queue(maxsize=1)
actor_manager = vllm_utils3.ActorManager.remote()
vllm_engines = create_vllm_engines(
num_engines=1,
tensor_parallel_size=1,
enforce_eager=True,
tokenizer_name_or_path=tokenizer_name,
pretrain=tokenizer_name,
revision="main",
seed=42,
enable_prefix_caching=False,
max_model_len=512,
vllm_gpu_memory_utilization=0.5,
prompt_queue=param_prompt_Q,
results_queue=inference_results_Q,
actor_manager=actor_manager,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure but feels like the tokenizer and vllm engine could be in the setup of the test

actor_manager=actor_manager,
)

generation_config = SamplingParams(temperature=0.0, top_p=1.0, max_tokens=5, n=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think its possible to get a deterministic output for Pythia 14m and specifically test for the expected completion for this prompt? Not sure how important that is

@finbarrtimbers
Copy link
Collaborator Author

Is this going to be a GPU job for every single updated main? I can't tell but that feels like it would be a substantial number of jobs on the cluster :/

My attitude is to enable it and see if we get complaints.

I could also make the trigger a lot more restrictive and have it only trigger on changes to grpo_fast or vllm_utils3.py.

Copy link
Contributor

@mnoukhov mnoukhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

triggering on changes to grpo_fast or vllm3_utils makes sense to me
still feels like it could be overkill but I'm inclined to say let's try and then rollback if necessary

@finbarrtimbers finbarrtimbers added this pull request to the merge queue Aug 20, 2025
@mnoukhov mnoukhov removed this pull request from the merge queue due to a manual request Aug 20, 2025
@mnoukhov
Copy link
Contributor

I accidentally clicked approve instead of request changes but I'll leave it as is. Swap to changes on those files only and merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants