Break indexing documentation into subfolder with dedicated pages per index type #286

Copilot · 2025-11-19T11:27:39Z

Breaking the Indexing Page into a Folder with Sub-Pages ✅

This PR successfully restructures the documentation for indexing by breaking the single indexing.md file into a folder structure with separate pages for each type, and adds comprehensive best practices, performance guidance, and validation examples.

Completed Tasks:

Changes Summary:

New Structure Created:

cypher/indexing/
├── index.md (parent page with overview + URL redirects)
├── range-index.md (Range Index documentation with best practices)
├── fulltext-index.md (Full-text Index documentation with advanced query syntax)
└── vector-index.md (Vector Index documentation with parameter guidance)

URL Redirects (Backward Compatibility):

Added redirect_from configuration to handle legacy URLs:
- /cypher/indexing.html → /cypher/indexing/
- /cypher/indexing → /cypher/indexing/
Uses jekyll-redirect-from plugin for seamless redirection

Navigation Hierarchy:

Parent: index.md with has_children: true and parent: "Cypher Language"
Children: All three index type pages with parent: "Indexing" and grand_parent: "Cypher Language"
Each child has proper nav_order for consistent ordering

Updated References:

/index.md - Main homepage now links directly to specific index types
/cypher/procedures.md - Procedure documentation links updated
/cypher/index.md - Parent Cypher page link updated

Extended Full-Text Search Documentation (Fixes #287):
Added comprehensive "Query Syntax and Features" section covering:

Tokenization: How text is split into searchable words
Prefix Matching: Using * wildcard for autocomplete-style searches
Fuzzy Matching: Using %term%distance syntax for typo-tolerant searches
Combining Features: Boolean operators (AND, OR, NOT) with examples

Each feature includes:

Detailed explanations and use cases
Code examples in all supported languages (Shell, Python, JavaScript, Java, Rust)
Performance notes and best practices
Links to RediSearch query syntax documentation

Comprehensive Documentation Enhancements:

Each index type page now includes:

Supported Data Types & Limitations: Clear explanation of what can and cannot be indexed
Validation Examples: How to use GRAPH.EXPLAIN and GRAPH.PROFILE to verify index usage before and after creation
Index Management: How to list existing indexes using db.indexes() procedure
Performance Tradeoffs: Detailed analysis of benefits, costs, write overhead, storage, and maintenance
Best Practices: When to use each index type and when NOT to use them
Real-world Examples: Practical code examples in all supported languages

Range Index Enhancements:

Supported data types section (String, Numeric, Geospatial, Arrays)
Verification examples showing before/after index creation with GRAPH.EXPLAIN
Performance tradeoffs (write overhead, storage, maintenance costs)
Best practices for cardinality and query patterns
GRAPH.PROFILE usage for performance validation

Full-text Index Enhancements:

When to use vs when NOT to use (compared to range indexes)
Configuration best practices (language selection, stopwords, phonetic search)
Performance considerations (tokenization costs, storage overhead)
Verification examples with GRAPH.EXPLAIN
Language and stemming warnings with recommendations

Vector Index Enhancements:

Detailed parameter explanations (dimension, M, efConstruction, efRuntime) with recommended values
Similarity function tradeoffs (cosine vs euclidean) with use cases
Memory usage calculations with formula
Real-world vector search examples showing embedding workflows
Verification examples with GRAPH.EXPLAIN
Troubleshooting section for common issues (dimension mismatch, poor recall, high memory)
Performance tuning recommendations

Wording Improvements:

Range index: Improved geospatial index limitation description
Vector index: Changed "At the moment" to "Currently" for better clarity

Spellcheck Updates:

Added all technical terms to .wordlist.txt:
- Initial: Levenshtein, autocomplete, tokenization, tokenized
- Additional: HNSW, ANN, tradeoff, tradeoffs, unnormalized

Files Changed:

Deleted: cypher/indexing.md (863 lines)
Created: 4 new files (1,577 lines total - comprehensive documentation)
Modified: 6 existing files (including redirect configuration)
Enhanced: All index pages with 455+ lines of best practices, validation examples, and performance guidance

Benefits:

✅ Better organization - each index type has its own dedicated page
✅ Easier navigation - users can directly access the index type they need
✅ Improved maintainability - changes to one index type don't affect others
✅ Consistent with existing documentation structure (follows patterns from /algorithms/ and /commands/)
✅ All internal links updated - no broken references
✅ Backward compatibility - old URLs redirect to new structure seamlessly
✅ Comprehensive full-text search examples for common use cases (fuzzy, prefix, tokenization)
✅ Clearer, more professional wording throughout
✅ Spellcheck passes successfully - all technical terms properly whitelisted
✅ Complete performance guidance - helps users choose the right index type
✅ Validation examples - users can verify their indexes are working correctly
✅ Real-world examples - practical code in all supported languages
✅ Best practices - clear guidance on when and how to use each index type
✅ Parameter guidance - detailed explanations for vector index tuning

Original prompt

This section details on the original issue you should resolve

<issue_title>Break the indexing page to a folder and sub pages per index type</issue_title>
<issue_description>Break the indexing page to a folder and sub pages per index type.

This repo is using Opire - what does it mean? 👇

💵 Everyone can add rewards for this issue commenting /reward 100 (replace 100 with the amount).
🕵️‍♂️ If someone starts working on this issue to earn the rewards, they can comment /try to let everyone know!
🙌 And when they open the PR, they can comment /claim FalkorDB/docs#285 either in the PR description or in a PR's comment.

🪙 Also, everyone can tip any user commenting /tip 20 @gkorland (replace 20 with the amount, and @gkorland with the user to tip).

📖 If you want to learn more, check out our documentation.

Comments on the Issue (you are @copilot in this section)

Fixes Break the indexing page to a folder and sub pages per index type #285

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Summary by CodeRabbit

Documentation

Reorganized indexing documentation for improved navigation and clarity
Split comprehensive indexing guide into dedicated sections for Range Index, Full-Text Index, and Vector Index types
Updated cross-references throughout documentation to direct to appropriate index-specific pages

coderabbitai · 2025-11-19T11:27:47Z

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The indexing documentation is restructured from a single file (cypher/indexing.md) into a folder hierarchy with three index-type-specific pages (range, full-text, vector) and a landing page. Documentation links across the site are updated to reference the new structure.

Changes

Cohort / File(s)	Change Summary
Documentation structure reorganization `cypher/indexing.md`	Deleted monolithic indexing documentation file
New indexing folder with dedicated pages `cypher/indexing/index.md`, `cypher/indexing/range-index.md`, `cypher/indexing/fulltext-index.md`, `cypher/indexing/vector-index.md`	Added new landing page and three index-type-specific documentation pages with multi-language code examples and comprehensive coverage of range, full-text, and vector indexing
Cross-reference link updates `cypher/index.md`, `cypher/procedures.md`, `index.md`	Updated internal documentation links to point to new indexing sub-pages; added two new vector index procedure entries to procedures.md

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Verify all links across documentation correctly point to new index type pages (range-index, fulltext-index, vector-index)
Confirm complete content migration from deleted cypher/indexing.md to three new specialized pages
Check that the new cypher/indexing/index.md landing page properly serves as a navigation hub
Validate no broken anchor references or orphaned documentation sections

Poem

🐰 The indexing forest grows so wide,
With range, fulltext, and vectors guide,
Each page now shines, distinct and bright,
One big file split just right,
A rabbit's dream of organized light! 🌳✨

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main structural change: breaking the monolithic indexing documentation into a subfolder with separate pages for each index type.
Linked Issues check	✅ Passed	The PR fully satisfies issue #285 by splitting indexing.md into cypher/indexing/ folder with dedicated pages (index.md, range-index.md, fulltext-index.md, vector-index.md) and updating navigation/cross-references accordingly.
Out of Scope Changes check	✅ Passed	All changes are directly related to the stated objectives: folder restructuring, page reorganization, and cross-reference updates. No extraneous modifications are present.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Co-authored-by: gkorland <[email protected]>

gkorland · 2025-11-19T12:06:24Z

@copilot regarding the full text indexing please review this issue and extend this page accordingly #287

gkorland · 2025-11-19T12:06:33Z

@CodeRabbit review

coderabbitai · 2025-11-19T12:06:39Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

cypher/indexing/range-index.md (1)
161-161: Minor: Consider simplifying redundant phrase.

Line 161 reads "matching nodes outside of the given radius" — for conciseness, "outside" alone suffices. This is a low-priority style improvement.

Apply this diff to simplify:
-Geospatial indexes can currently only be leveraged with `<` and `<=` filters; matching nodes outside of the given radius is performed using conventional matching.
+Geospatial indexes can currently only be leveraged with `<` and `<=` filters; matching nodes outside the given radius are matched using conventional traversal.
cypher/indexing/vector-index.md (1)
82-84: Minor: Verbose phrase can be simplified.

Line 83 uses "At the moment" — consider using an adverb form like "Currently" for a more concise phrasing.

Apply this diff:
-Please note, when creating a vector index, both the vector dimension and similarity function
-must be provided. At the moment the only supported similarity functions are 'euclidean' or 'cosine'.
+Please note, when creating a vector index, both the vector dimension and similarity function
+must be provided. Currently, the only supported similarity functions are 'euclidean' or 'cosine'.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f2c2bb and ad2630f.

📒 Files selected for processing (8)

cypher/index.md (1 hunks)
cypher/indexing.md (0 hunks)
cypher/indexing/fulltext-index.md (1 hunks)
cypher/indexing/index.md (1 hunks)
cypher/indexing/range-index.md (1 hunks)
cypher/indexing/vector-index.md (1 hunks)
cypher/procedures.md (1 hunks)
index.md (1 hunks)

💤 Files with no reviewable changes (1)

cypher/indexing.md

🧰 Additional context used

🪛 LanguageTool

cypher/indexing/vector-index.md

[style] ~83-~83: For conciseness, consider replacing this expression with an adverb.
Context: ...d similarity function must be provided. At the moment the only supported similarity functions...

(AT_THE_MOMENT)

cypher/indexing/range-index.md

[style] ~161-~161: This phrase is redundant. Consider using “outside”.
Context: ...th < and <= filters; matching nodes outside of the given radius is performed using con...

(OUTSIDE_OF)

🪛 markdownlint-cli2 (0.18.1)

cypher/indexing/range-index.md

78-78: Multiple spaces after hash on atx style heading