-
Notifications
You must be signed in to change notification settings - Fork 366
[Jinja] Everything is an identifier #1418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🤯 |
- Add support for tilde operator - Handle custom transformers-specific `generation` tag. - Be aware of curly bracket depth when lexing
I've made a bunch of improvements, and now There were some strange cases to consider (people are very creative), but I think this is really impactful for the new chat template formatting feature (cc @mishig25). Also cc @Rocketknight1 for viz, just highlighting the kinds of features that people are using in the wild. |
this is insane 🤯 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm !
So, like other jinja implementations, we should rather treat all "keywords" as identifiers during lexing, and only assert semantics during parsing.
do you have examples of other implementations?
I've now tested rendering chat templates on the top 100,000 templates, and I think I've got all the cases covered now! 🥳 Last thing: improve formatting to not require excessive bracketing (not the most important thing; might leave for a separate PR).
I'm mainly basing my implementation on the official library, and double-checking with minijinja. |
Turns out, it was pretty simple ✅ Before:
After:
Also, I've tested that formatting doesn't affect any output for the top 100,000 templates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, amazing!
I think this is a good checkpoint for now ✅ Will merge and put out a new release (0.5.0). Summary:
This now means we support (at least) parsing, formatting, and rendering of the top 100,000 transformers-compatible chat templates on the Hugging Face Hub 🥳 |
Turns out, ✨ anything ✨ can be an identifier, even supposed keywords. For example, the following jinja template is valid:
So, like other jinja implementations, we should rather treat all "keywords" as identifiers during lexing, and only assert semantics during parsing.
This is necessary to support the
call
functionality (see docs), which had conflicts with existing templates on the HF hub that happened to usecall
as a variable instead.