process input as str by adding it to a list by fleurvanl · Pull Request #43 · NetherlandsForensicInstitute/asmtransformers

fleurvanl · 2026-05-01T14:16:26Z

No description provided.

ranieri · 2026-05-06T07:30:38Z


    def tokenize(self, texts, split_special_tokens=False, **kwargs):
        encoded_inputs = []
+        texts = [texts] if isinstance(texts, str) else texts


This works for me, but looking at transformers is see two distinct ways of doing this: the one above and checking whether the first element of the sequence is a sequence.

I have no idea why they sometimes use one and sometimes the other.

https://github.com/huggingface/transformers/blob/7f6419e67de355ee173344c1bfd68cb60288e121/src/transformers/tokenization_python.py#L719

process input as str by adding it to a list

050fc29

fleurvanl linked an issue May 1, 2026 that may be closed by this pull request

Tokenizer breaks on single-string input #23

Open

in one line

04747a8

ranieri reviewed May 6, 2026

View reviewed changes

ranieri approved these changes May 6, 2026

View reviewed changes

akaIDIOT approved these changes May 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

process input as str by adding it to a list#43

process input as str by adding it to a list#43
fleurvanl wants to merge 2 commits intomainfrom
23-tokenizer-breaks-on-single-string-input

fleurvanl commented May 1, 2026

Uh oh!

ranieri May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fleurvanl commented May 1, 2026

Uh oh!

ranieri May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants