Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a problem with overlap verification in line2pairs ? #18

Open
Ramskii opened this issue Aug 1, 2019 · 1 comment
Open

Is there a problem with overlap verification in line2pairs ? #18

Ramskii opened this issue Aug 1, 2019 · 1 comment

Comments

@Ramskii
Copy link

Ramskii commented Aug 1, 2019

In line2pairs, line 53 and 56, it checks whether the input ngram and the output ngram overlap (completely or partially).
For complete overlap, I reckon you have to check if the first token of each ngram is the same and if both ngrams are the same length. However, this last check is performed with the input_order and output_order variables, that don't represent those particular ngrams' length but the maximum ngram's length to search for in the line. For example, if you have input_order = 1, output_order = 2 and overlap = True, you will never pass the input_order = output_order check and therefore you will eventually get ngrams paired with themselves.
The same thing happens with the partial overlap check.

Shouldn't line 53 be
if i == l and j == k:
instead of
if i == l and input_order == output_order:

And line 56
if len(set(range(i, i + j)) & set(range(l, l + k))) > 0:
instead of
if len(set(range(i, i + j)) & set(range(l, l + k))) > 0:

@zhezhaoa
Copy link
Owner

zhezhaoa commented Aug 1, 2019

Thanks a lot. You are right.
I have fixed the problem.
I also analyze the impact of this bug on the final results.
It has minor impact on overlap setting.
But it can influence the results when non-overlap setting is used and window size is small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants