Replies: 1 comment
-
That's a great suggestion kyr0 and it's easy to implement for the developer to split document into extracted titles h1 and run wikibm25 only on those, give them x5 weight, then run wikibm25 on the normal text. This is just the base code and every site will vary in scoring. There's lots more features to come and I hope to outline in youtube reddit and arxiv paper. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
TL;DR: Forget it, I've just stumbled upon your the code that does exactly that. Cool algos!
Just an idea: In documents, not only the text has a relevance. The dimension of formatting plays a role too. The simple example is, that
<h1>
in SEO ranks higher than<h2>
but in Markdown text, it's simple to run the search algo in partials over the different headings of a text while indexing the document and assign each a different priority score. This can lead to a higher accuracy in search results by ranking not only based on matches, but also based on "priority boost".Beta Was this translation helpful? Give feedback.
All reactions