Syllable-aware BPE tokenizer for the Amharic language (አማርኛ) – fast, accurate, trainable.
nlp machine-learning deep-learning tokenizer python3 text-processing amharic african-languages language-processing ethiopic low-resource-languages bpe llm bpe-tokenizer amharic-tokenizer amharictokenizer amhtokenizer
-
Updated
Nov 17, 2025 - Python