File tree 2 files changed +1404
-0
lines changed
candle-examples/examples/marian-mt
2 files changed +1404
-0
lines changed Original file line number Diff line number Diff line change @@ -17,3 +17,22 @@ cargo run --example marian-mt --release -- \
17
17
I know you are waiting for me. I will go through the forest, I will go through the
18
18
mountain. I cannot stay far from you any longer.</s>
19
19
```
20
+
21
+ ## Generating the tokenizer.json files
22
+
23
+ You can use the following script to generate the ` tokenizer.json ` config files
24
+ from the hf-hub repos. This requires the ` tokenizers ` and ` sentencepiece `
25
+ packages to be install and use the ` convert_slow_tokenizer.py ` script from this
26
+ directory.
27
+
28
+ ``` python
29
+ from convert_slow_tokenizer import MarianConverter
30
+ from transformers import AutoTokenizer
31
+
32
+
33
+ tokenizer = AutoTokenizer.from_pretrained(" Helsinki-NLP/opus-mt-fr-en" , use_fast = False )
34
+ fast_tokenizer = MarianConverter(tokenizer, index = 0 ).converted()
35
+ fast_tokenizer.save(f " tokenizer-marian-base-fr.json " )
36
+ fast_tokenizer = MarianConverter(tokenizer, index = 1 ).converted()
37
+ fast_tokenizer.save(f " tokenizer-marian-base-en.json " )
38
+ ```
You can’t perform that action at this time.
0 commit comments