Skip to content

Reverse retokenize() #9299

Sep 27, 2021 · 1 comments · 1 reply
Discussion options

You must be logged in to vote

The context manager is just there to batch the retokenizations so it's all processed efficiently when you exit the context rather processing than one retokenization at a time.

You can save the state of doc before the retokenzation and then restore (if you're just using built-in attributes, doc.to_array and doc.from_array is the efficient way to do this), but there's no built-in way to revert a retokenization. The previous tokenization is not saved or stored in any way in the doc or by the retokenizer.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@xsway
Comment options

Answer selected by polm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / doc Feature: Doc, Span and Token objects
2 participants