Skip to content

Saving a Custom Tokenizer to Disk #9341

Discussion options

You must be logged in to vote

I'm confused as to what you're doing here. Components take and return Docs, but you seem to be taking a string. That's normal for the tokenizer, but the tokenizer isn't decorated with Language.component and isn't put in the pipeline. How are you actually using this?

As far as serializing your component, you should look at the section on serializing components in the docs. Since your component is actually stateless, I think if you decorate a function that instantiates it rather than a class it should work.

Also, backing up a little, I don't think you need a component here at all. You should just be able to use tokenizer exceptions to handle brackets the way you want.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@wjbmattingly
Comment options

Answer selected by svlandeg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / serialize Feature: Serialization, saving and loading feat / tokenizer Feature: Tokenizer
2 participants