Saving a Custom Tokenizer to Disk #9341
-
Hello all, I could use a bit of advice. I am trying to add a pipe that does preprocessing on texts so that the tokenization separates parentheses and brackets appropriately when they are attached to a token incorrectly. Here is my solution after a bit of searching the web:
When I run this, everything works great. When I try and save it to disk, I receive the following error:
Is there an easy solution here that I am just missing? Thanks for any help you all can provide, William |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I'm confused as to what you're doing here. Components take and return Docs, but you seem to be taking a string. That's normal for the tokenizer, but the tokenizer isn't decorated with As far as serializing your component, you should look at the section on serializing components in the docs. Since your component is actually stateless, I think if you decorate a function that instantiates it rather than a class it should work. Also, backing up a little, I don't think you need a component here at all. You should just be able to use tokenizer exceptions to handle brackets the way you want. |
Beta Was this translation helpful? Give feedback.
I'm confused as to what you're doing here. Components take and return Docs, but you seem to be taking a string. That's normal for the tokenizer, but the tokenizer isn't decorated with
Language.component
and isn't put in the pipeline. How are you actually using this?As far as serializing your component, you should look at the section on serializing components in the docs. Since your component is actually stateless, I think if you decorate a function that instantiates it rather than a class it should work.
Also, backing up a little, I don't think you need a component here at all. You should just be able to use tokenizer exceptions to handle brackets the way you want.