Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use word stems for key term matching #1

Open
Mr0grog opened this issue Dec 3, 2019 · 0 comments
Open

Use word stems for key term matching #1

Mr0grog opened this issue Dec 3, 2019 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@Mr0grog
Copy link
Member

Mr0grog commented Dec 3, 2019

When we check whether any key terms have changed, we should try out using word stemming instead of exact matches:

key_terms = {term: all_terms[term]
for term in KEY_TERMS
if abs(all_terms.get(term, 0)) > 0}
key_terms_changed = len(key_terms) > 0
key_terms_change_count = sum((abs(count)
for term, count in key_terms.items()))

This should be something we can turn on/off, since I’m not sure how well it will work and whether we’ll get a lot of false positives.

To keep things comprehensible, we need to keep a map of "stemmed terms" → "actual terms" so that we can present them as the actual terms, even though we are matching by stem.

NLTK supports several different stemming implementations, so I need to do some reading and testing as to what makes the most sense. API docs: https://www.nltk.org/api/nltk.stem.html

@Mr0grog Mr0grog added the enhancement New feature or request label Dec 3, 2019
@Mr0grog Mr0grog self-assigned this Dec 3, 2019
@Mr0grog Mr0grog moved this to Inbox in Web Monitoring Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Inbox
Development

No branches or pull requests

1 participant