Use word stems for key term matching #1

Mr0grog · 2019-12-03T23:02:12Z

When we check whether any key terms have changed, we should try out using word stemming instead of exact matches:

web-monitoring-task-sheets/analyst_sheets/analyze.py

Lines 190 to 195 in 074cc01

    
           key_terms = {term: all_terms[term] 
        
                        for term in KEY_TERMS 
        
                        if abs(all_terms.get(term, 0)) > 0} 
        
           key_terms_changed = len(key_terms) > 0 
        
           key_terms_change_count = sum((abs(count) 
        
                                         for term, count in key_terms.items()))

This should be something we can turn on/off, since I’m not sure how well it will work and whether we’ll get a lot of false positives.

To keep things comprehensible, we need to keep a map of "stemmed terms" → "actual terms" so that we can present them as the actual terms, even though we are matching by stem.

NLTK supports several different stemming implementations, so I need to do some reading and testing as to what makes the most sense. API docs: https://www.nltk.org/api/nltk.stem.html

Mr0grog added the enhancement New feature or request label Dec 3, 2019

Mr0grog self-assigned this Dec 3, 2019

Mr0grog added the [priority-★★★] label Dec 4, 2019

Mr0grog removed the [priority-★★★] label Oct 21, 2020

Mr0grog moved this to Inbox in Web Monitoring Feb 17, 2025

Mr0grog added this to Web Monitoring Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use word stems for key term matching #1

Use word stems for key term matching #1

Mr0grog commented Dec 3, 2019

Use word stems for key term matching #1

Use word stems for key term matching #1

Comments

Mr0grog commented Dec 3, 2019