Skip to content

Conversation

@edwintorok
Copy link

Running QAnom on a large document results in an error because the entire document is attempted to be processed at once:

RuntimeError: The expanded size of the tensor (12646) must match the existing size (512) at non-singleton dimension 1.  Target sizes: [1, 12646].  Tensor sizes: [1, 512]

A workaround is to use a separate pipeline without QAnom to first split the document into sentences (assuming that your sentences are <512 tokens), but that is cumbersome.

Instead do what NominalizationDetector already does: give a list of sentences to the tokenizer.

Now the limit is 512 tokens in a sentence, which is a much more reasonable limit.

Running QAnom on a large document results in an error because the entire
document is attempted to be processed at once:
```
RuntimeError: The expanded size of the tensor (12646) must match the existing size (512) at non-singleton dimension 1.  Target sizes: [1, 12646].  Tensor sizes: [1, 512]
```

A workaround is to use a separate pipeline without QAnom to first split
the document into sentences (assuming that your sentences are <512
tokens), but that is cumbersome.

Instead do what NominalizationDetector already does: give a list of
sentences to the tokenizer.

Now the limit is 512 tokens in a sentence, which is a much more
reasonable limit.

Signed-off-by: Edwin Török <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant