-
Notifications
You must be signed in to change notification settings - Fork 477
[Jinja] Everything is an identifier #1418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🤯 |
- Add support for tilde operator - Handle custom transformers-specific `generation` tag. - Be aware of curly bracket depth when lexing
I've made a bunch of improvements, and now There were some strange cases to consider (people are very creative), but I think this is really impactful for the new chat template formatting feature (cc @mishig25). Also cc @Rocketknight1 for viz, just highlighting the kinds of features that people are using in the wild. |
this is insane 🤯 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm !
So, like other jinja implementations, we should rather treat all "keywords" as identifiers during lexing, and only assert semantics during parsing.
do you have examples of other implementations?
I've now tested rendering chat templates on the top 100,000 templates, and I think I've got all the cases covered now! 🥳 Last thing: improve formatting to not require excessive bracketing (not the most important thing; might leave for a separate PR).
I'm mainly basing my implementation on the official library, and double-checking with minijinja. |
Turns out, it was pretty simple ✅ Before:
After:
Also, I've tested that formatting doesn't affect any output for the top 100,000 templates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, amazing!
I think this is a good checkpoint for now ✅ Will merge and put out a new release (0.5.0). Summary:
This now means we support (at least) parsing, formatting, and rendering of the top 100,000 transformers-compatible chat templates on the Hugging Face Hub 🥳 |
- Aligned usage of identifiers w/ official jinja implementation (anything can now be an identifier) - We now support comment parsing and formatting (meaning this won't be preprocessed away when formatting). Rendering remains unchanged of course (ignore comments). - Differentiate between Integer and Float types, which is something Python does, but JavaScript doesn't, so we needed to take special care here. - Add support for new statement types: - filter: `{% filter %}...{% endfilter %}` - call: `{% call %}...{% endcall %}` - (custom) generation: `{% generation %}{% endgeneration %}` - Add support for new expression types: - spread: `{{ fn(*args) }}` - ternary (previously we just used the if expression): `{{ 1 if true else 2 }}` - Add support for new functions: - `replace` - `strftime_now` (get current time according to a narrow set of time templates, found in the wild) - and many others - Reduce number of redundant brackets when formatting membership and property accesses. - Improved binary operator precedence rules and general formatting rules. - Fixed edge-case lexing issues This now means we support (at least) parsing, formatting, and rendering of the top 100,000 transformers-compatible chat templates on the Hugging Face Hub 🥳
Turns out, ✨ anything ✨ can be an identifier, even supposed keywords. For example, the following jinja template is valid:
So, like other jinja implementations, we should rather treat all "keywords" as identifiers during lexing, and only assert semantics during parsing.
This is necessary to support the
call
functionality (see docs), which had conflicts with existing templates on the HF hub that happened to usecall
as a variable instead.