Reverse retokenize() #9299
-
Hi, I looked for similar questions but couldn't find an answer. Is there a way to go back to the doc's tokenization after applying To me this is a natural use case, e.g. doing something specific on custom tokenization but then go back to the proper parse for the original tokenization to do some other things. Initially, I even thought the context manager is there exactly for this purpose and exiting it would restore the tokenization, but that's not the case. Thanks in advance for any pointers! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
The context manager is just there to batch the retokenizations so it's all processed efficiently when you exit the context rather processing than one retokenization at a time. You can save the state of doc before the retokenzation and then restore (if you're just using built-in attributes, |
Beta Was this translation helpful? Give feedback.
The context manager is just there to batch the retokenizations so it's all processed efficiently when you exit the context rather processing than one retokenization at a time.
You can save the state of doc before the retokenzation and then restore (if you're just using built-in attributes,
doc.to_array
anddoc.from_array
is the efficient way to do this), but there's no built-in way to revert a retokenization. The previous tokenization is not saved or stored in any way in the doc or by the retokenizer.