Skip to content

Conversation

@abukhoy
Copy link
Contributor

@abukhoy abukhoy commented Aug 4, 2025

This pull request introduces support for compile-time options via keyword arguments (kwargs), including the aic-hw-version parameter, which now accepts values "ai100" or "ai200". If no value is provided, the default is "ai100", representing the AI100 hardware.

These enhancements allow users to tailor the compile API to better suit their specific requirements.

Example Usage:

from QEfficient import QEFFAutoModelForCausalLM
from transformers import AutoTokenizer

model_name = "gpt2"
model = QEFFAutoModelForCausalLM.from_pretrained(model_name, num_hidden_layers=2)

model.compile(prefill_seq_len=128, ctx_len=256, num_cores=16, num_devices=1, **{'aic-hw-version': 'ai100'})

tokenizer = AutoTokenizer.from_pretrained(model_name)
model.generate(prompts=["Hi there!!"], tokenizer=tokenizer)

Note: Previously, the default value for aic-hw-version was "2.0", which implicitly referred to AI100. This value is now deprecated and replaced with the explicit "ai100" identifier.

@abukhoy
Copy link
Contributor Author

abukhoy commented Aug 6, 2025

I have made a little change to the _compile function of the base class by including some helper method. If it's not okay then I will revert it.

@quic-hemagnih
Copy link
Contributor

Is anything pending on this? I think we are good to merge this change.

@quic-rishinr
Copy link
Contributor

Is anything pending on this? I think we are good to merge this change.

Yes, the compiler changes need to be merged first before we proceed with adding this change to Qeff.

Copy link
Contributor

@quic-amitraj quic-amitraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@quic-hemagnih quic-hemagnih merged commit faab245 into quic:main Sep 5, 2025
4 checks passed
ochougul pushed a commit that referenced this pull request Nov 3, 2025
This pull request introduces support for compile-time options via
keyword arguments (`kwargs`), including the `aic-hw-version` parameter,
which now accepts values `"ai100"` or `"ai200"`. If no value is
provided, the default is `"ai100"`, representing the AI100 hardware.

These enhancements allow users to tailor the `compile` API to better
suit their specific requirements.

```python
from QEfficient import QEFFAutoModelForCausalLM
from transformers import AutoTokenizer

model_name = "gpt2"
model = QEFFAutoModelForCausalLM.from_pretrained(model_name, num_hidden_layers=2)

model.compile(prefill_seq_len=128, ctx_len=256, num_cores=16, num_devices=1, **{'aic-hw-version': 'ai100'})

tokenizer = AutoTokenizer.from_pretrained(model_name)
model.generate(prompts=["Hi there!!"], tokenizer=tokenizer)
```

> **Note:** Previously, the default value for `aic-hw-version` was
`"2.0"`, which implicitly referred to AI100. This value is now
deprecated and replaced with the explicit `"ai100"` identifier.

---------

Signed-off-by: Abukhoyer Shaik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants