Skip to content

Conversation

markurtz
Copy link
Collaborator

@markurtz markurtz commented Sep 4, 2025

Summary

Converter architecture for EAGLE v1, EAGLE v2, and HASS research checkpoints to standardized Speculators format. This PR implements the EagleSpeculatorConverter class with automatic feature detection, weight remapping, configuration translation, and validation capabilities, enabling seamless integration of Eagle-style speculative decoding models into the Speculators ecosystem.

Details

  • Added EagleSpeculatorConverter class in eagle.py:
    • Supports EAGLE v1, EAGLE v2, and HASS checkpoint formats through registry registration
    • Implements automatic feature detection for fusion bias and layernorms based on checkpoint structure
    • Ported existing functionality into the new system
  • Fixed CLI documentation in main.py:
    • Corrected malformed example command for Eagle v3 conversion with proper syntax
  • Added comprehensive test suite in test_eagle.py

Test Plan

  • Execute new test suite: pytest [test_eagle.py](http://_vscodecontentref_/4) -v
  • Verify converter registration: python -c "from speculators.convert.converters import EagleSpeculatorConverter; print('SUCCESS')"
  • Test CLI help documentation: speculators convert --help
  • Validate conversion workflow with mock Eagle checkpoints across different configurations

Related Issues

  • N/A

@markurtz markurtz self-assigned this Sep 4, 2025
Copy link

github-actions bot commented Sep 4, 2025

📦 Build Artifacts Available
The build artifacts (`.whl` and `.tar.gz`) have been successfully generated and are available for download: https://github.com/vllm-project/speculators/actions/runs/17595615165/artifacts/3969150692.
They will be retained for up to 30 days.
Commit: c97a83a

Copy link
Collaborator

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Mark, thanks for this diff, I have a few remarks; Could we not do the weight name remapping; to support this name change on vllm, we have to introduce extra logic to re-map these weights back to original.

Secondly, we're still putting on a restriction that only one layer can be used for eagle however there are already numerous checkpoints on Huggingface at the moment that have been trained with multiple layers and give better acceptance rates, do we not want to support those?

If those two things won't be supported for now, this diff looks okay

Comment on lines +113 to +116
has_layers_non_0 = any(
name.startswith("layers.") and not name.startswith("layers.0.")
for name in state_dict
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will 0.2.0 not support multiple layers? I think we we should remove this check since going forward we will need support for multiple layers

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also multiple eagle checkpoints on Huggingface that already use more than one decoder layers in the Eagle Head

weight_mappings: Annotated[
dict[str, str],
"Parameter name mappings from Eagle checkpoint format to Speculators format",
] = {"fc.": "fusion_fc.", "layers.0.": "transformer."}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think we should not be doing this renaming of weights since it adds extra complexity on vllm side

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

Comment on lines +80 to +82
"embed_layernorm.weight": "embedding_layernorm.weight",
"hidden_layernorm.weight": "transformer.input_layernorm.weight",
"lm_head_layernorm.weight": "pre_lm_head_layernorm.weight",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

vocab_size=eagle_config.get("vocab_size", 32000),
hidden_size=eagle_config.get("hidden_size", 4096),
intermediate_size=eagle_config.get("intermediate_size", 11008),
num_hidden_layers=1, # Eagle always uses a single decoder layer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should rely on the checkpoint/config to determine this

self, weight_name: str, fusion_bias: bool, layernorms: bool
) -> Literal["keep", "ignore", "extra"]:
if weight_name == "embed_tokens.weight":
return "ignore"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Not a fan of relying on Literal strings for classification, could we convert this to an enum?

@dsikka dsikka requested a review from fynnsu September 8, 2025 19:59
Copy link
Collaborator

@fynnsu fynnsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Mark, thanks for the pr. I've add a few comments below. Overall it looks good, but I think could benefit from simplifying / removing a few of the helper fns.

num_attention_heads=eagle_config.get("num_attention_heads", 32),
num_key_value_heads=eagle_config.get("num_key_value_heads"),
hidden_act=eagle_config.get("hidden_act", "silu"),
max_position_embeddings=eagle_config.get("max_position_embeddings", 4096),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +244 to +266
def _pretrained_config_from_eagle(self, eagle_config: dict) -> LlamaConfig:
return LlamaConfig(
vocab_size=eagle_config.get("vocab_size", 32000),
hidden_size=eagle_config.get("hidden_size", 4096),
intermediate_size=eagle_config.get("intermediate_size", 11008),
num_hidden_layers=1, # Eagle always uses a single decoder layer
num_attention_heads=eagle_config.get("num_attention_heads", 32),
num_key_value_heads=eagle_config.get("num_key_value_heads"),
hidden_act=eagle_config.get("hidden_act", "silu"),
max_position_embeddings=eagle_config.get("max_position_embeddings", 4096),
initializer_range=eagle_config.get("initializer_range", 0.02),
rms_norm_eps=eagle_config.get("rms_norm_eps", 1e-6),
use_cache=eagle_config.get("use_cache", True),
pad_token_id=eagle_config.get("pad_token_id"),
bos_token_id=eagle_config.get("bos_token_id", 1),
eos_token_id=eagle_config.get("eos_token_id", 2),
tie_word_embeddings=False, # Eagle uses separate embed_tokens from verifier
rope_theta=eagle_config.get("rope_theta", 10000.0),
rope_scaling=eagle_config.get("rope_scaling"),
attention_bias=eagle_config.get("attention_bias", False),
attention_dropout=eagle_config.get("attention_dropout", 0.0),
mlp_bias=eagle_config.get("mlp_bias", False),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be done with something like

config_dict = LlamaConfig().to_dict()
config_dict.update(eagle_config)
return LlamaConfig(**config_dict)

And then we could even drop this helper function.

Comment on lines +268 to +273
def _eagle_speculator_config(
self,
orig_config: dict,
fusion_bias: bool,
layernorms: bool,
) -> EagleSpeculatorConfig:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we inline this function? Seems like we could just call EagleSpeculatorConfig directly in convert_config_state_dict

Comment on lines +136 to +140
@pytest.fixture
def temp_directory():
"""Temporary directory for testing file operations."""
with tempfile.TemporaryDirectory() as temp_dir:
yield temp_dir
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Use the built-in pytest tmp_path fixture

@markurtz markurtz force-pushed the features/converters/base-entrypoints branch 2 times, most recently from 193edda to d7918aa Compare September 9, 2025 18:57
@markurtz markurtz force-pushed the features/converters/eagle branch from 7338ae7 to b2bd409 Compare September 9, 2025 21:04
@markurtz markurtz force-pushed the features/converters/eagle branch from b2bd409 to c97a83a Compare September 9, 2025 21:04
Copy link
Collaborator

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the open comments, this diff looks good, specially the unit tests, great job on that!

Comment on lines +113 to +116
has_layers_non_0 = any(
name.startswith("layers.") and not name.startswith("layers.0.")
for name in state_dict
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also multiple eagle checkpoints on Huggingface that already use more than one decoder layers in the Eagle Head

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants