Description
Hi,
first of all, thank you for the awesome project — really cool!
Describe the bug
I'm currently experimenting with the new @blocknote/xl-ai
package, using the Vercel OpenAI-compatible provider to connect with a local llama.cpp
server. I'm running smaller models from HuggingFace e. g. to summarize personal notes.
These HF-models typically include a chat_template
in the tokenizer_config.json
, which is used when converting to .gguf
format during quantization. The default chat_template
often expects alternating user/assistant messages (see examples below). Using default settings, this causes the llama.cpp
server to raise an error (HTTP 500), when the request has multiple system
role messages.
Examples:
- Tokenizer from Mistral-7B
- GGUF from Gemma-4B
The error seems to originate from this section of the code:
https://github.com/TypeCellOS/BlockNote/blob/main/packages/xl-ai/src/api/LLMRequest.ts#L168-L180
The comments around the code suggest the issue is known already.
I'm currently fiddling with a workaround via a custom fetch-function in the OpenAI-compatible provider. It modifies the request by merging all system messages into a single system-message
before sending it to the llama-server
.
Another workaround would be to start the llama-server
with the flags --jinja --chat-template chatml
, which seems to work — but I noticed that breaks compatibility with the default llama.cpp
webUI due to BOS/EOS token issues.
To Reproduce
- Download a GGUF model that includes a strict
chat_template
, such as Gemma 3B. - Start the
llama.cpp
server with:./llama-server.exe --model gemma-3b.gguf --port 8000 --jinja --cache-reuse 256 --ctx-size 8192
- Use the
@blocknote/xl-ai
package with the OpenAI-compatible provider@ai-sdk/openai-compatible
. - Observe the 500 error from the server due to
chat_template
validation.
Misc
I'm not sure if other LLM engines such as:
also run into this issue when using the default provided tokenizer_config.json
or chat_template
.
- Node version:
- Package manager:
- Browser:
- I'm a sponsor and would appreciate if you could look into this sooner than later 💖