-
Notifications
You must be signed in to change notification settings - Fork 9
Converters refactor [eagle v1]: EAGLE v1, EAGLE v2, and HASS converter architecture #118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: features/converters/base-entrypoints
Are you sure you want to change the base?
Converters refactor [eagle v1]: EAGLE v1, EAGLE v2, and HASS converter architecture #118
Conversation
📦 Build Artifacts Available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Mark, thanks for this diff, I have a few remarks; Could we not do the weight name remapping; to support this name change on vllm, we have to introduce extra logic to re-map these weights back to original.
Secondly, we're still putting on a restriction that only one layer can be used for eagle however there are already numerous checkpoints on Huggingface at the moment that have been trained with multiple layers and give better acceptance rates, do we not want to support those?
If those two things won't be supported for now, this diff looks okay
has_layers_non_0 = any( | ||
name.startswith("layers.") and not name.startswith("layers.0.") | ||
for name in state_dict | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will 0.2.0 not support multiple layers? I think we we should remove this check since going forward we will need support for multiple layers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also multiple eagle checkpoints on Huggingface that already use more than one decoder layers in the Eagle Head
weight_mappings: Annotated[ | ||
dict[str, str], | ||
"Parameter name mappings from Eagle checkpoint format to Speculators format", | ||
] = {"fc.": "fusion_fc.", "layers.0.": "transformer."} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think we should not be doing this renaming of weights since it adds extra complexity on vllm side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
"embed_layernorm.weight": "embedding_layernorm.weight", | ||
"hidden_layernorm.weight": "transformer.input_layernorm.weight", | ||
"lm_head_layernorm.weight": "pre_lm_head_layernorm.weight", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
vocab_size=eagle_config.get("vocab_size", 32000), | ||
hidden_size=eagle_config.get("hidden_size", 4096), | ||
intermediate_size=eagle_config.get("intermediate_size", 11008), | ||
num_hidden_layers=1, # Eagle always uses a single decoder layer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should rely on the checkpoint/config to determine this
self, weight_name: str, fusion_bias: bool, layernorms: bool | ||
) -> Literal["keep", "ignore", "extra"]: | ||
if weight_name == "embed_tokens.weight": | ||
return "ignore" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Not a fan of relying on Literal strings for classification, could we convert this to an enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Mark, thanks for the pr. I've add a few comments below. Overall it looks good, but I think could benefit from simplifying / removing a few of the helper fns.
num_attention_heads=eagle_config.get("num_attention_heads", 32), | ||
num_key_value_heads=eagle_config.get("num_key_value_heads"), | ||
hidden_act=eagle_config.get("hidden_act", "silu"), | ||
max_position_embeddings=eagle_config.get("max_position_embeddings", 4096), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this one differs from the standard llama config (2048)? https://github.com/huggingface/transformers/blob/a871f6f58d49f3a05ae9dae519caa8aa9d919a07/src/transformers/models/llama/configuration_llama.py#L170
def _pretrained_config_from_eagle(self, eagle_config: dict) -> LlamaConfig: | ||
return LlamaConfig( | ||
vocab_size=eagle_config.get("vocab_size", 32000), | ||
hidden_size=eagle_config.get("hidden_size", 4096), | ||
intermediate_size=eagle_config.get("intermediate_size", 11008), | ||
num_hidden_layers=1, # Eagle always uses a single decoder layer | ||
num_attention_heads=eagle_config.get("num_attention_heads", 32), | ||
num_key_value_heads=eagle_config.get("num_key_value_heads"), | ||
hidden_act=eagle_config.get("hidden_act", "silu"), | ||
max_position_embeddings=eagle_config.get("max_position_embeddings", 4096), | ||
initializer_range=eagle_config.get("initializer_range", 0.02), | ||
rms_norm_eps=eagle_config.get("rms_norm_eps", 1e-6), | ||
use_cache=eagle_config.get("use_cache", True), | ||
pad_token_id=eagle_config.get("pad_token_id"), | ||
bos_token_id=eagle_config.get("bos_token_id", 1), | ||
eos_token_id=eagle_config.get("eos_token_id", 2), | ||
tie_word_embeddings=False, # Eagle uses separate embed_tokens from verifier | ||
rope_theta=eagle_config.get("rope_theta", 10000.0), | ||
rope_scaling=eagle_config.get("rope_scaling"), | ||
attention_bias=eagle_config.get("attention_bias", False), | ||
attention_dropout=eagle_config.get("attention_dropout", 0.0), | ||
mlp_bias=eagle_config.get("mlp_bias", False), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be done with something like
config_dict = LlamaConfig().to_dict()
config_dict.update(eagle_config)
return LlamaConfig(**config_dict)
And then we could even drop this helper function.
def _eagle_speculator_config( | ||
self, | ||
orig_config: dict, | ||
fusion_bias: bool, | ||
layernorms: bool, | ||
) -> EagleSpeculatorConfig: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we inline this function? Seems like we could just call EagleSpeculatorConfig
directly in convert_config_state_dict
@pytest.fixture | ||
def temp_directory(): | ||
"""Temporary directory for testing file operations.""" | ||
with tempfile.TemporaryDirectory() as temp_dir: | ||
yield temp_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Use the built-in pytest tmp_path fixture
193edda
to
d7918aa
Compare
7338ae7
to
b2bd409
Compare
…/CLI to utilize it Signed-off-by: Mark Kurtz <[email protected]>
Signed-off-by: Mark Kurtz <[email protected]>
Signed-off-by: Mark Kurtz <[email protected]>
b2bd409
to
c97a83a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the open comments, this diff looks good, specially the unit tests, great job on that!
has_layers_non_0 = any( | ||
name.startswith("layers.") and not name.startswith("layers.0.") | ||
for name in state_dict | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also multiple eagle checkpoints on Huggingface that already use more than one decoder layers in the Eagle Head
Summary
Converter architecture for EAGLE v1, EAGLE v2, and HASS research checkpoints to standardized Speculators format. This PR implements the
EagleSpeculatorConverter
class with automatic feature detection, weight remapping, configuration translation, and validation capabilities, enabling seamless integration of Eagle-style speculative decoding models into the Speculators ecosystem.Details
EagleSpeculatorConverter
class in eagle.py:Test Plan
pytest [test_eagle.py](http://_vscodecontentref_/4) -v
python -c "from speculators.convert.converters import EagleSpeculatorConverter; print('SUCCESS')"
speculators convert --help
Related Issues