Fix error on large documents #4

edwintorok · 2023-11-12T15:42:36Z

Running QAnom on a large document results in an error because the entire document is attempted to be processed at once:

RuntimeError: The expanded size of the tensor (12646) must match the existing size (512) at non-singleton dimension 1.  Target sizes: [1, 12646].  Tensor sizes: [1, 512]

A workaround is to use a separate pipeline without QAnom to first split the document into sentences (assuming that your sentences are <512 tokens), but that is cumbersome.

Instead do what NominalizationDetector already does: give a list of sentences to the tokenizer.

Now the limit is 512 tokens in a sentence, which is a much more reasonable limit.

Running QAnom on a large document results in an error because the entire document is attempted to be processed at once: ``` RuntimeError: The expanded size of the tensor (12646) must match the existing size (512) at non-singleton dimension 1. Target sizes: [1, 12646]. Tensor sizes: [1, 512] ``` A workaround is to use a separate pipeline without QAnom to first split the document into sentences (assuming that your sentences are <512 tokens), but that is cumbersome. Instead do what NominalizationDetector already does: give a list of sentences to the tokenizer. Now the limit is 512 tokens in a sentence, which is a much more reasonable limit. Signed-off-by: Edwin Török <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix error on large documents #4

Fix error on large documents #4

Uh oh!

edwintorok commented Nov 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix error on large documents #4

Are you sure you want to change the base?

Fix error on large documents #4

Uh oh!

Conversation

edwintorok commented Nov 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant