-
Notifications
You must be signed in to change notification settings - Fork 1
update qwen3-235b-a22b-fp8 #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @yyzxw, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request updates the configuration for the "qwen3-235b-a22b-fp8" model. It primarily introduces a new "maxTokens" setting to define the maximum number of tokens the model can handle and refines the GPU resource requirements for its deployments, optimizing the allocation of hardware resources. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Model Metadata Validation Results✅ All validations passed! Changed files:
Validation output:Please make sure all metadata files conform to the schema defined in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request updates the qwen3-235b-a22b-fp8 model's metadata, specifically adjusting the maxTokens configuration and reducing the required gpuCount from 8 to 6. The maxTokens parameter is added to the config, and the number of GPUs is reduced, potentially affecting the model's performance and resource utilization.
| resourceRequirements: | ||
| cpu: 16 | ||
| gpuCount: 8 | ||
| gpuType: nvidia-vgpu | ||
| gpuCount: 6 | ||
| gpuType: vgpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reduction in gpuCount from 8 to 6 could impact the model's performance. It's important to ensure that this change doesn't negatively affect the model's inference speed or ability to handle large batches. Verify the performance after this change.
Consider adding a comment explaining why the gpuCount was reduced. Was it due to resource constraints, cost optimization, or other reasons?
| config: | ||
| maxTokens: 40960 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding maxTokens is good for configuring the model's context window. However, ensure that this value aligns with the model's capabilities and doesn't exceed its maximum supported context length, which could lead to unexpected behavior or errors. It's important to validate that the model functions correctly with this specific maxTokens value.
b683ef5 to
cec899e
Compare
Model Metadata Validation Results✅ All validations passed! Changed files:
Validation output:Please make sure all metadata files conform to the schema defined in |
| cpu: 16 | ||
| gpuCount: 8 | ||
| gpuType: nvidia-vgpu | ||
| gpuCount: 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
235b 支持 tp6?
cec899e to
0223e01
Compare
Model Metadata Validation Results✅ All validations passed! Changed files:
Validation output:Please make sure all metadata files conform to the schema defined in |
part of #44