Skip to content

Comments

Fix irrelevant search results for short queries like GSoC#744

Open
MuhammadAashirAslam wants to merge 1 commit intoprecice:masterfrom
MuhammadAashirAslam:fix/search-typo-tolerance
Open

Fix irrelevant search results for short queries like GSoC#744
MuhammadAashirAslam wants to merge 1 commit intoprecice:masterfrom
MuhammadAashirAslam:fix/search-typo-tolerance

Conversation

@MuhammadAashirAslam
Copy link
Contributor

Fixes #733

Summary

Improve search precision for short single-word queries (e.g. gsoc) by preventing irrelevant matches from XML code tokens — implemented as a client-side query-time adjustment, without modifying indexing or documentation content.


Problem

While testing search locally, I noticed that short queries like gsoc were returning unrelated XML reference results.

This was happening due to Algolia's default typo tolerance behavior.

By default:

  • Words with 4+ characters allow typos
  • Prefix matching is also applied

So the query:

gsoc

was matching:

<m2n:sockets>

because:

gsoc → matched "soc" (prefix of "sockets")
nbTypos: 2
nbExactWords: 0

This was confirmed via Algolia ranking info:

"_rankingInfo": {
    "nbTypos": 2,
    "nbExactWords": 0,
    "firstMatchedWord": 2000
}

Meaning the result was purely a fuzzy match from code content, not real page relevance.


Change

A client-side rendering fix was implemented to make short queries stricter without touching indexing or documentation.

In both Algolia client entry points:

  • _includes/algolia.html
  • js/algolia-search.js

the following parameters were added inside the searchFunction:

helper.setQueryParameter('minWordSizefor1Typo', 5);
helper.setQueryParameter('minWordSizefor2Typos', 9);

Effect

  • Words of length ≤4 (like gsoc) now require an exact match
  • Words ≤8 characters allow at most 1 typo (instead of 2)

This prevents short queries from matching unrelated code tokens such as:

gsoc → sockets

Why this approach

  • No changes to _config.yml
  • No changes to indexing
  • No documentation/reference files modified
  • No global typo tolerance disabled

This is purely a query-time adjustment in client rendering, ensuring:

  • Short queries stop returning noisy XML matches
  • Normal documentation search remains unaffected

Files Changed

File Change
_includes/algolia.html Added query parameter override
js/algolia-search.js Same change

Both updates were applied inside the searchFunction.


Validation

After this change:

  • Searching gsoc no longer returns irrelevant XML reference matches
  • Other queries like configuration continue returning correct results

Note

This change only improves relevance for short queries at the client level. Making pages containing terms like "GSoC" fully discoverable would additionally require adding keywords to searchableAttributes in _config.yml and rebuilding the Algolia index with bundle exec jekyll algolia. This is outside the scope of this PR since it involves indexing configuration.


Screenshot Before

image

Screenshot After

image

@MakisH
Copy link
Member

MakisH commented Feb 23, 2026

Thank you for your contribution. However, I am confused. Testing locally, it does not seem to have any effect, and the screenshots don't demonstrate the expected result.

General note: the PR description is quite long for what the change is (I guess suggested by some tool). This makes it tricky to see what is going on.

I see that there is a similar discussion in #741. I am not going to interact with these PRs for now, and let the discussion converge.

@MakisH MakisH added GSoC Contributed in the context of the Google Summer of Code technical Technical issues on the website labels Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

GSoC Contributed in the context of the Google Summer of Code technical Technical issues on the website

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Search for ‘GSoC’ Returns Irrelevant XML Documentation Pages

2 participants