Reverse retokenize() #9299

xsway · 2021-09-27T10:21:30Z

xsway
Sep 27, 2021

Hi,

I looked for similar questions but couldn't find an answer. Is there a way to go back to the doc's tokenization after applying retokenize()?

To me this is a natural use case, e.g. doing something specific on custom tokenization but then go back to the proper parse for the original tokenization to do some other things. Initially, I even thought the context manager is there exactly for this purpose and exiting it would restore the tokenization, but that's not the case.

Thanks in advance for any pointers!

Answered by adrianeboyd

Sep 27, 2021

The context manager is just there to batch the retokenizations so it's all processed efficiently when you exit the context rather processing than one retokenization at a time.

You can save the state of doc before the retokenzation and then restore (if you're just using built-in attributes, doc.to_array and doc.from_array is the efficient way to do this), but there's no built-in way to revert a retokenization. The previous tokenization is not saved or stored in any way in the doc or by the retokenizer.

View full answer

adrianeboyd · 2021-09-27T10:34:54Z

adrianeboyd
Sep 27, 2021

The context manager is just there to batch the retokenizations so it's all processed efficiently when you exit the context rather processing than one retokenization at a time.

You can save the state of doc before the retokenzation and then restore (if you're just using built-in attributes, doc.to_array and doc.from_array is the efficient way to do this), but there's no built-in way to revert a retokenization. The previous tokenization is not saved or stored in any way in the doc or by the retokenizer.

1 reply

xsway Sep 27, 2021
Author

Thanks, this answers my question! Will try to see whether it's makes sense to use doc.to_array and doc.from_array in my scenario.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reverse retokenize() #9299

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Reverse retokenize() #9299

xsway Sep 27, 2021

Replies: 1 comment · 1 reply

adrianeboyd Sep 27, 2021

xsway Sep 27, 2021 Author

xsway
Sep 27, 2021

Replies: 1 comment 1 reply

adrianeboyd
Sep 27, 2021

xsway Sep 27, 2021
Author