-
Couldn't load subscription status.
- Fork 113
[Model] Support smollm2 #858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: ldwang <[email protected]>
Summary of ChangesHello @ftgreat, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces full support for the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the smollm2 model, including configuration files, tokenizer integration, and checkpoint conversion scripts. The overall structure is consistent with how other models are supported in this repository. However, I've found several issues, including hardcoded paths in configuration files which harm reproducibility, and what appear to be copy-paste errors in the checkpoint conversion scripts that could lead to bugs. I've also noted some opportunities for code cleanup and improvement regarding style and efficiency. Please see my detailed comments below.
| args.consumed_train_samples = 0 | ||
| args.consumed_valid_samples = 0 | ||
| args.norm_has_bias = False | ||
| args.tokenizer_type = "Llama3TokenizerFS" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tokenizer_type is hardcoded to Llama3TokenizerFS. Since this script is for smollm2, it should be set to SmolLM2TokenizerFS. This is a critical bug that will cause the checkpoint conversion to fail or produce an incorrect result.
| args.tokenizer_type = "Llama3TokenizerFS" | |
| args.tokenizer_type = "SmolLM2TokenizerFS" |
| args.rotary_seq_len_interpolation_factor = ( | ||
| None if llama_args["rope_scaling"] == "null" else llama_args["rope_scaling"] | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check llama_args["rope_scaling"] == "null" is likely incorrect. In JSON, null is a keyword, not a string. When loaded by Python's json library, it becomes None. The check should be llama_args["rope_scaling"] is None.
| args.rotary_seq_len_interpolation_factor = ( | |
| None if llama_args["rope_scaling"] == "null" else llama_args["rope_scaling"] | |
| ) | |
| args.rotary_seq_len_interpolation_factor = ( | |
| None if llama_args.get("rope_scaling") is None else llama_args["rope_scaling"] | |
| ) |
| no_mmap_bin_files: true | ||
| tokenizer: | ||
| tokenizer_type: SmolLM2TokenizerFS | ||
| tokenizer_path: /share/projset/ldwang/models/HuggingFaceTB/SmolLM2-135M |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| args.bf16 = llama_args["dtype"] == "bfloat16" | ||
| args.fp16 = llama_args["dtype"] == "float16" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Hugging Face config.json for Llama-like models typically uses the key torch_dtype, not dtype. Using dtype here will likely result in a KeyError. This seems to be a copy-paste error. Please verify the key in the smollm2 model's config.json and use the correct one, which is probably torch_dtype.
| args.bf16 = llama_args["dtype"] == "bfloat16" | |
| args.fp16 = llama_args["dtype"] == "float16" | |
| args.bf16 = llama_args["torch_dtype"] == "bfloat16" | |
| args.fp16 = llama_args["torch_dtype"] == "float16" |
|
|
||
|
|
||
| def save_args_mg2hf(args): | ||
| from transformers import LlamaConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # nnodes: 1 | ||
| nproc_per_node: 1 | ||
| cmds: | ||
| before_start: ulimit -n 1048576 && source /root/miniconda3/bin/activate flagscale-train |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| NCCL_P2P_LL_THRESHOLD: 0 | ||
| IXCCL_MIX_NV: 1 | ||
| IXCCL_FUSED_ENABLE: 0 | ||
| NCCL_IB_DISABLE: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| import torch | ||
|
|
||
| sys.path.append("..") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @@ -0,0 +1,4 @@ | |||
| import sys | |||
|
|
|||
| sys.path.append("..") | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| import sys | ||
|
|
||
| sys.path.append("..") | ||
| from mixtral.model import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Support smollm2 tokenizer, training and checkpoint convert.