Skip to content

Pull requests: huggingface/tokenizers

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Mark PyEncoder as frozen and use interior mutability
#1860 opened Sep 2, 2025 by ngoldbaum Loading…
Fix unsigned integer underflow issue with truncation
#1859 opened Sep 1, 2025 by maxdebayser Loading…
feat: add cli for tokenizer and training
#1842 opened Aug 6, 2025 by b00f Loading…
feat: whitespace optimize
#1841 opened Aug 6, 2025 by b00f Loading…
Unused Unicode Character Filter
#1832 opened Jul 23, 2025 by sanderland Loading…
Add enforce_utf8_boundaries option to BpeTrainer
#1830 opened Jul 22, 2025 by sanderland Loading…
Faster Whitespace PreTokenizer (Drop-in Replacement)
#1822 opened Jul 7, 2025 by 8ria Loading…
Add 3.13t CI using pytest-run-parallel
#1809 opened Jun 23, 2025 by ngoldbaum Loading…
Track lockfile
#1806 opened Jun 22, 2025 by sftse Loading…
Adding multiprocessing for sentencepiece_extractor
#1804 opened Jun 19, 2025 by AamodThakur Loading…
add group capture to replace
#1788 opened Jun 3, 2025 by cboseak Loading…
Add Truncate pre-tokenizer
#1783 opened May 27, 2025 by ArthurZucker Draft
Make unigram cache optional
#1763 opened Apr 18, 2025 by wangrunji0408 Loading…
Implement Append normalizer
#1755 opened Mar 24, 2025 by austinleedavis Loading…
Add FxHash and ShortStringOptimization.
#1733 opened Feb 10, 2025 by MeetThePatel Loading…
3 of 4 tasks
Does windows aarch work ?
#1719 opened Jan 10, 2025 by Narsil Loading…
Draft backtrack
#1712 opened Jan 3, 2025 by ArthurZucker Draft
Fast regex
#1605 opened Aug 8, 2024 by ArthurZucker Draft
ProTip! no:milestone will show everything without a milestone.