-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion to add BM25 Score #57
Comments
This seems super useful! I might suggest adding substitution for lazy evaluation (so it matches the rest of the code) and experimenting around with S3 methods in case this falls over for data.tables, but I'm happy to do that work and fully integrate it if @juliasilge and/or @dgrtwo give a thumbs up to the general ticket scope? |
Thanks |
Oh, just the indices-based selection can sometimes get gnarly since it behaves somewhat differently. It'll probably be fine, but I'll check to make sure once David/Julia sign off (hinthint) |
We are working on getting broken things fixed, cleaned up, etc for our 0.1.3 release, but let's come back and get this implemented for tidytext 0.1.4! |
If that's the goal, I'll add it to the to-do! Anything I can do to help with the fixing, cleanup, etc? |
Is there any update on this? It would be good to have a TF-IDF alternative. |
No recent work on this, but if you are looking for an alternative to tf-idf that may fit your needs better, check out weighted log odds with the tidylo package. |
That's very helpful, thank you. |
I suggest to add a function to bind BM25 score (which is based on a probabilistic term weighting model). It is useful in some cases as it gives control over:
It is commonly used as a ranking function by search engines.
I implemented a function
bind_bm25
in the forked repo HEREThe text was updated successfully, but these errors were encountered: