Skip to content

Demo implementation for migration of ES to SOLR#1270

Open
RayyanSeliya wants to merge 2 commits intometabrainz:masterfrom
RayyanSeliya:feat/solr-demo
Open

Demo implementation for migration of ES to SOLR#1270
RayyanSeliya wants to merge 2 commits intometabrainz:masterfrom
RayyanSeliya:feat/solr-demo

Conversation

@RayyanSeliya
Copy link
Copy Markdown
Contributor

Problem

i have built a small solr demo for the bookbrainz “es -> solr” migration so we can validate the end-to-end approach (schema/analyzers, indexing, and website search queries) before doing a full migration and get to know where we can have a potential errors migrating and tackle those in the upcoming full migration

this pr is for draft demo for review/feedback not for production !

the demo focuses on:

  • multilingual search behavior (icu + char mapping)
  • work–author relationship lookup + type-filtered results

Solution

this PR adds a runnable solr configset + a minimal solr search implementation for the website.

key parts:

  • solr-config/conf/:
    • schema.xml (field types + analyzers)
    • solrconfig.xml (request handlers, including /autocomplete)
    • mapping-chars.txt (pre-token normalization)
  • solr setup + indexing:
    • docker-compose.solr.yml
    • setup-solr-with-icu.sh
    • scripts/index-solr-test-data.js
    • demo dataset loader used by the website auto-index path: src/common/helpers/search-test-data.ts
  • website integration:
    • src/common/helpers/search-solr.ts (solr-based search logic)
    • src/common/helpers/search-switch.ts (toggles solr vs elastic via USE_SOLR)
    • src/server/app.js + src/server/routes/search.tsx wired to use the switch

notes / constraints:

  • solr data for this demo is hardcoded (17 documents).
  • autocomplete and fuzzy/typo behavior are not fully “product-ready” yet:
    • autocomplete logic exists (solr /autocomplete handler) but is not integrated into the main website search ui yet
    • fuzzy/typo behavior still needs refinement to be reliable and safe

manual verification

i tested the website search with the demo dataset and recorded a walkthrough video.

tested (high level):

  • author searches: lovecraft, tolkien, murakami, austen, asimov
  • work searches: cthulhu, lord rings, kafka, pride prejudice, foundation
  • edition / identifiers: cthulhu 1928, anniversary, kafka vintage, isbn 978-0-486-27204-8, ids like nm0522454, Q892
  • publishers / series / groups: penguin, vintage, foundation series, trilogy, LOTR
  • multilingual searches through the website: ラヴクラフト, トールキン, 村上, Остин, 指輪物語
  • work–author relationship + type filter: searching lovecraft shows the author and “the call of cthulhu”, and filtering Type: Work narrows results

video:

demo-solr-results.online-video-cutter.com.1.mp4

AI usage

  • i did not use any ai
  • i have used ai in this pr (drafting parts of the indexing scripts and some code)
    • i used ai for faster drafting, but the query/indexing logic was taken from and mirrored to match the existing solr docs and the current elasticsearch + mb* implementations:
    • i used ai tools for coding
    • i used ai tools for communication
    • i understand all the changes made in this pr

Action

  1. i have run the code and manually tested the changes
  2. if needed, reviewers can run the demo using:
    1. bash setup-solr-with-icu.sh
    2. USE_SOLR=true ./develop.sh
    3. open http://localhost:9099 and try the search queries listed above

Signed-off-by: RayyanSeliya <rayyanseliya786@gmail.com>
@RayyanSeliya
Copy link
Copy Markdown
Contributor Author

ignore the lint errors :)

@faizanakhtar123
Copy link
Copy Markdown
Contributor

@RayyanSeliya nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants