[Jinja] Everything is an identifier #1418

xenova · 2025-05-02T04:03:03Z

Turns out, ✨ anything ✨ can be an identifier, even supposed keywords. For example, the following jinja template is valid:

{% if if in in %}a{% endif %}{% set if = "a" %}{% set in = "abc" %}{% if if in in %}b{% endif %}

So, like other jinja implementations, we should rather treat all "keywords" as identifiers during lexing, and only assert semantics during parsing.

This is necessary to support the call functionality (see docs), which had conflicts with existing templates on the HF hub that happened to use call as a variable instead.

julien-c · 2025-05-02T08:02:00Z

🤯

- Add support for tilde operator - Handle custom transformers-specific `generation` tag. - Be aware of curly bracket depth when lexing

xenova · 2025-05-03T02:25:10Z

I've made a bunch of improvements, and now @huggingface/jinja supports at least the top 100,000 valid template on the HF hub (ordered by downloads). 🥳

There were some strange cases to consider (people are very creative), but I think this is really impactful for the new chat template formatting feature (cc @mishig25).

Also cc @Rocketknight1 for viz, just highlighting the kinds of features that people are using in the wild.

julien-c · 2025-05-03T11:16:56Z

supports at least the top 100,000 valid template on the HF hub

this is insane 🤯

Python vs. JS

mishig25

lgtm !

So, like other jinja implementations, we should rather treat all "keywords" as identifiers during lexing, and only assert semantics during parsing.

do you have examples of other implementations?

xenova · 2025-05-05T02:35:33Z

I've now tested rendering chat templates on the top 100,000 templates, and I think I've got all the cases covered now! 🥳

Last thing: improve formatting to not require excessive bracketing (not the most important thing; might leave for a separate PR).

do you have examples of other implementations?

I'm mainly basing my implementation on the official library, and double-checking with minijinja.

xenova · 2025-05-05T03:00:28Z

Turns out, it was pretty simple ✅

Before:

{%- set reasoning_content = ((((((message.content).split("</think>"))[0]).rstrip("\n")).split("<think>"))[-1]).lstrip("\n") -%}

After:

{%- set reasoning_content = message.content.split("</think>")[0].rstrip("\n").split("<think>")[-1].lstrip("\n") %}

Also, I've tested that formatting doesn't affect any output for the top 100,000 templates.

mishig25

lgtm, amazing!

xenova · 2025-05-05T20:53:16Z

I think this is a good checkpoint for now ✅ Will merge and put out a new release (0.5.0).

Summary:

Aligned usage of identifiers w/ official jinja implementation (anything can now be an identifier)
We now support comment parsing and formatting (meaning this won't be preprocessed away when formatting). Rendering remains unchanged of course (ignore comments).
Differentiate between Integer and Float types, which is something Python does, but JavaScript doesn't, so we needed to take special care here.
Add support for new statement types:
- filter: {% filter %}...{% endfilter %}
- call: {% call %}...{% endcall %}
- (custom) generation: {% generation %}{% endgeneration %}
Add support for new expression types:
- spread: {{ fn(*args) }}
- ternary (previously we just used the if expression): {{ 1 if true else 2 }}
Add support for new functions:
- replace
- strftime_now (get current time according to a narrow set of time templates, found in the wild)
- and many others
Reduce number of redundant brackets when formatting membership and property accesses.
Improved binary operator precedence rules and general formatting rules.
Fixed edge-case lexing issues

This now means we support (at least) parsing, formatting, and rendering of the top 100,000 transformers-compatible chat templates on the Hugging Face Hub 🥳

- Aligned usage of identifiers w/ official jinja implementation (anything can now be an identifier) - We now support comment parsing and formatting (meaning this won't be preprocessed away when formatting). Rendering remains unchanged of course (ignore comments). - Differentiate between Integer and Float types, which is something Python does, but JavaScript doesn't, so we needed to take special care here. - Add support for new statement types: - filter: `{% filter %}...{% endfilter %}` - call: `{% call %}...{% endcall %}` - (custom) generation: `{% generation %}{% endgeneration %}` - Add support for new expression types: - spread: `{{ fn(*args) }}` - ternary (previously we just used the if expression): `{{ 1 if true else 2 }}` - Add support for new functions: - `replace` - `strftime_now` (get current time according to a narrow set of time templates, found in the wild) - and many others - Reduce number of redundant brackets when formatting membership and property accesses. - Improved binary operator precedence rules and general formatting rules. - Fixed edge-case lexing issues This now means we support (at least) parsing, formatting, and rendering of the top 100,000 transformers-compatible chat templates on the Hugging Face Hub 🥳

xenova added 6 commits May 1, 2025 23:38

Everything is an identifier

bfb8c61

Code improvements

8c6aefe

Re-order tests

65680ea

Add context-specific keyword tests

a5abdb5

Handle special case for membership of undefined

a29f2e1

Remove unused literals

f93f308

xenova added 10 commits May 2, 2025 11:36

Add MEMBERSHIP_UNDEFINED and TERNARY_CONSECUTIVE unit tests

df588ab

Add new tests for filter + call statements

c973a66

Add tilde operator

df0eb97

Add minicpm e2e test

b91b7bb

Implement basic call/filter statements

7342226

Improved lexing

f3b3117

- Add support for tilde operator - Handle custom transformers-specific `generation` tag. - Be aware of curly bracket depth when lexing

Implement call & macro statements fully

7542b4f

Support consecutive string parsing

e28ab57

Add support for iterable unpacking

288d641

Add support for variable unpacking in set

8bbc205

xenova marked this pull request as ready for review May 3, 2025 02:12

Merge branch 'main' into identifiers

b82dc9c

xenova requested review from julien-c, mishig25 and Rocketknight1 May 3, 2025 02:25

Lint & formatting

cadeed1

xenova added 5 commits May 3, 2025 19:56

Support fractional numeric literals

5cb4ae9

Add support for int and float filters

a5870be

Differentiate between integer and float types

bacfbe9

Python vs. JS

Add jamba e2e test

d06c782

Allow comments to be added to the AST instead of stripped

b094bc4

xenova added 3 commits May 4, 2025 01:17

Assert comments end with #}

1101173

Add e2e llama vision test

2b1aa04

Differentiate between if statement and ternary expression

f4c9051

mishig25 approved these changes May 4, 2025

View reviewed changes

xenova and others added 3 commits May 4, 2025 22:27

New functionality

5b6d280

Add new unit tests

abeb7a4

Merge branch 'main' into identifiers

50fc899

xenova added 2 commits May 4, 2025 22:49

Fix 'or' precendence

4e34ab4

Improve formatting of chained property accesses

e056139

mishig25 approved these changes May 5, 2025

View reviewed changes

xenova and others added 7 commits May 5, 2025 14:00

Add formatting unit tests

87681d9

Improve formatting

0ab1ae6

Improve object & membership formatting

a425563

Improve formatting + add new tests

9804619

Lint & format

e3325d4

Merge branch 'main' into identifiers

1e87ae6

nit

1bb84ac

xenova merged commit b5deb41 into main May 5, 2025
5 checks passed

xenova deleted the identifiers branch May 5, 2025 20:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Jinja] Everything is an identifier #1418

[Jinja] Everything is an identifier #1418

Uh oh!

xenova commented May 2, 2025

Uh oh!

julien-c commented May 2, 2025

Uh oh!

xenova commented May 3, 2025 •

edited

Loading

Uh oh!

julien-c commented May 3, 2025

Uh oh!

mishig25 left a comment

Uh oh!

xenova commented May 5, 2025

Uh oh!

xenova commented May 5, 2025 •

edited

Loading

Uh oh!

mishig25 left a comment

Uh oh!

xenova commented May 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[Jinja] Everything is an identifier #1418

[Jinja] Everything is an identifier #1418

Uh oh!

Conversation

xenova commented May 2, 2025

Uh oh!

julien-c commented May 2, 2025

Uh oh!

xenova commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

julien-c commented May 3, 2025

Uh oh!

mishig25 left a comment

Choose a reason for hiding this comment

Uh oh!

xenova commented May 5, 2025

Uh oh!

xenova commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mishig25 left a comment

Choose a reason for hiding this comment

Uh oh!

xenova commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xenova commented May 3, 2025 •

edited

Loading

xenova commented May 5, 2025 •

edited

Loading

xenova commented May 5, 2025 •

edited

Loading