Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 19, 2025

Breaking the Indexing Page into a Folder with Sub-Pages ✅

This PR successfully restructures the documentation for indexing by breaking the single indexing.md file into a folder structure with separate pages for each type, and adds comprehensive best practices, performance guidance, and validation examples.

Completed Tasks:

  • Create indexing folder in /home/runner/work/docs/docs/cypher/
  • Create index.md in the indexing folder as the parent page with overview content
  • Create range-index.md for Range Index documentation
  • Create fulltext-index.md for Full-text Index documentation
  • Create vector-index.md for Vector Index documentation
  • Update the cypher/index.md to link to the new indexing folder
  • Remove the old indexing.md file
  • Update all references to point to new sub-pages
  • Verify the documentation structure is correct
  • Extend full-text search documentation with query syntax features (issue Extend the full-text search docs with examples #287)
  • Apply CodeRabbit suggestions for improved wording
  • Fix spellchecker wordlist (all technical terms added)
  • Add comprehensive tradeoffs, validation examples, and best practices for all index types
  • Add URL redirects for backward compatibility

Changes Summary:

New Structure Created:

cypher/indexing/
├── index.md (parent page with overview + URL redirects)
├── range-index.md (Range Index documentation with best practices)
├── fulltext-index.md (Full-text Index documentation with advanced query syntax)
└── vector-index.md (Vector Index documentation with parameter guidance)

URL Redirects (Backward Compatibility):

  • Added redirect_from configuration to handle legacy URLs:
    • /cypher/indexing.html/cypher/indexing/
    • /cypher/indexing/cypher/indexing/
  • Uses jekyll-redirect-from plugin for seamless redirection

Navigation Hierarchy:

  • Parent: index.md with has_children: true and parent: "Cypher Language"
  • Children: All three index type pages with parent: "Indexing" and grand_parent: "Cypher Language"
  • Each child has proper nav_order for consistent ordering

Updated References:

  1. /index.md - Main homepage now links directly to specific index types
  2. /cypher/procedures.md - Procedure documentation links updated
  3. /cypher/index.md - Parent Cypher page link updated

Extended Full-Text Search Documentation (Fixes #287):
Added comprehensive "Query Syntax and Features" section covering:

  • Tokenization: How text is split into searchable words
  • Prefix Matching: Using * wildcard for autocomplete-style searches
  • Fuzzy Matching: Using %term%distance syntax for typo-tolerant searches
  • Combining Features: Boolean operators (AND, OR, NOT) with examples

Each feature includes:

  • Detailed explanations and use cases
  • Code examples in all supported languages (Shell, Python, JavaScript, Java, Rust)
  • Performance notes and best practices
  • Links to RediSearch query syntax documentation

Comprehensive Documentation Enhancements:

Each index type page now includes:

  1. Supported Data Types & Limitations: Clear explanation of what can and cannot be indexed
  2. Validation Examples: How to use GRAPH.EXPLAIN and GRAPH.PROFILE to verify index usage before and after creation
  3. Index Management: How to list existing indexes using db.indexes() procedure
  4. Performance Tradeoffs: Detailed analysis of benefits, costs, write overhead, storage, and maintenance
  5. Best Practices: When to use each index type and when NOT to use them
  6. Real-world Examples: Practical code examples in all supported languages

Range Index Enhancements:

  • Supported data types section (String, Numeric, Geospatial, Arrays)
  • Verification examples showing before/after index creation with GRAPH.EXPLAIN
  • Performance tradeoffs (write overhead, storage, maintenance costs)
  • Best practices for cardinality and query patterns
  • GRAPH.PROFILE usage for performance validation

Full-text Index Enhancements:

  • When to use vs when NOT to use (compared to range indexes)
  • Configuration best practices (language selection, stopwords, phonetic search)
  • Performance considerations (tokenization costs, storage overhead)
  • Verification examples with GRAPH.EXPLAIN
  • Language and stemming warnings with recommendations

Vector Index Enhancements:

  • Detailed parameter explanations (dimension, M, efConstruction, efRuntime) with recommended values
  • Similarity function tradeoffs (cosine vs euclidean) with use cases
  • Memory usage calculations with formula
  • Real-world vector search examples showing embedding workflows
  • Verification examples with GRAPH.EXPLAIN
  • Troubleshooting section for common issues (dimension mismatch, poor recall, high memory)
  • Performance tuning recommendations

Wording Improvements:

  • Range index: Improved geospatial index limitation description
  • Vector index: Changed "At the moment" to "Currently" for better clarity

Spellcheck Updates:

  • Added all technical terms to .wordlist.txt:
    • Initial: Levenshtein, autocomplete, tokenization, tokenized
    • Additional: HNSW, ANN, tradeoff, tradeoffs, unnormalized

Files Changed:

  • Deleted: cypher/indexing.md (863 lines)
  • Created: 4 new files (1,577 lines total - comprehensive documentation)
  • Modified: 6 existing files (including redirect configuration)
  • Enhanced: All index pages with 455+ lines of best practices, validation examples, and performance guidance

Benefits:

✅ Better organization - each index type has its own dedicated page
✅ Easier navigation - users can directly access the index type they need
✅ Improved maintainability - changes to one index type don't affect others
✅ Consistent with existing documentation structure (follows patterns from /algorithms/ and /commands/)
✅ All internal links updated - no broken references
Backward compatibility - old URLs redirect to new structure seamlessly
✅ Comprehensive full-text search examples for common use cases (fuzzy, prefix, tokenization)
✅ Clearer, more professional wording throughout
Spellcheck passes successfully - all technical terms properly whitelisted
Complete performance guidance - helps users choose the right index type
Validation examples - users can verify their indexes are working correctly
Real-world examples - practical code in all supported languages
Best practices - clear guidance on when and how to use each index type
Parameter guidance - detailed explanations for vector index tuning

Original prompt

This section details on the original issue you should resolve

<issue_title>Break the indexing page to a folder and sub pages per index type</issue_title>
<issue_description>Break the indexing page to a folder and sub pages per index type.


This repo is using Opire - what does it mean? 👇
💵 Everyone can add rewards for this issue commenting /reward 100 (replace 100 with the amount).
🕵️‍♂️ If someone starts working on this issue to earn the rewards, they can comment /try to let everyone know!
🙌 And when they open the PR, they can comment /claim FalkorDB/docs#285 either in the PR description or in a PR's comment.

🪙 Also, everyone can tip any user commenting /tip 20 @gkorland (replace 20 with the amount, and @gkorland with the user to tip).

📖 If you want to learn more, check out our documentation.

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Summary by CodeRabbit

Documentation

  • Reorganized indexing documentation for improved navigation and clarity
  • Split comprehensive indexing guide into dedicated sections for Range Index, Full-Text Index, and Vector Index types
  • Updated cross-references throughout documentation to direct to appropriate index-specific pages

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 19, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The indexing documentation is restructured from a single file (cypher/indexing.md) into a folder hierarchy with three index-type-specific pages (range, full-text, vector) and a landing page. Documentation links across the site are updated to reference the new structure.

Changes

Cohort / File(s) Change Summary
Documentation structure reorganization
cypher/indexing.md
Deleted monolithic indexing documentation file
New indexing folder with dedicated pages
cypher/indexing/index.md, cypher/indexing/range-index.md, cypher/indexing/fulltext-index.md, cypher/indexing/vector-index.md
Added new landing page and three index-type-specific documentation pages with multi-language code examples and comprehensive coverage of range, full-text, and vector indexing
Cross-reference link updates
cypher/index.md, cypher/procedures.md, index.md
Updated internal documentation links to point to new indexing sub-pages; added two new vector index procedure entries to procedures.md

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify all links across documentation correctly point to new index type pages (range-index, fulltext-index, vector-index)
  • Confirm complete content migration from deleted cypher/indexing.md to three new specialized pages
  • Check that the new cypher/indexing/index.md landing page properly serves as a navigation hub
  • Validate no broken anchor references or orphaned documentation sections

Poem

🐰 The indexing forest grows so wide,
With range, fulltext, and vectors guide,
Each page now shines, distinct and bright,
One big file split just right,
A rabbit's dream of organized light! 🌳✨

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main structural change: breaking the monolithic indexing documentation into a subfolder with separate pages for each index type.
Linked Issues check ✅ Passed The PR fully satisfies issue #285 by splitting indexing.md into cypher/indexing/ folder with dedicated pages (index.md, range-index.md, fulltext-index.md, vector-index.md) and updating navigation/cross-references accordingly.
Out of Scope Changes check ✅ Passed All changes are directly related to the stated objectives: folder restructuring, page reorganization, and cross-reference updates. No extraneous modifications are present.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI changed the title [WIP] Break the indexing page into folder and sub pages per type Break indexing documentation into subfolder with dedicated pages per index type Nov 19, 2025
Copilot AI requested a review from gkorland November 19, 2025 11:41
@gkorland
Copy link
Contributor

@copilot regarding the full text indexing please review this issue and extend this page accordingly #287

@gkorland
Copy link
Contributor

@CodeRabbit review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 19, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
cypher/indexing/range-index.md (1)

161-161: Minor: Consider simplifying redundant phrase.

Line 161 reads "matching nodes outside of the given radius" — for conciseness, "outside" alone suffices. This is a low-priority style improvement.

Apply this diff to simplify:

-Geospatial indexes can currently only be leveraged with `<` and `<=` filters; matching nodes outside of the given radius is performed using conventional matching.
+Geospatial indexes can currently only be leveraged with `<` and `<=` filters; matching nodes outside the given radius are matched using conventional traversal.
cypher/indexing/vector-index.md (1)

82-84: Minor: Verbose phrase can be simplified.

Line 83 uses "At the moment" — consider using an adverb form like "Currently" for a more concise phrasing.

Apply this diff:

-Please note, when creating a vector index, both the vector dimension and similarity function
-must be provided. At the moment the only supported similarity functions are 'euclidean' or 'cosine'.
+Please note, when creating a vector index, both the vector dimension and similarity function
+must be provided. Currently, the only supported similarity functions are 'euclidean' or 'cosine'.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f2c2bb and ad2630f.

📒 Files selected for processing (8)
  • cypher/index.md (1 hunks)
  • cypher/indexing.md (0 hunks)
  • cypher/indexing/fulltext-index.md (1 hunks)
  • cypher/indexing/index.md (1 hunks)
  • cypher/indexing/range-index.md (1 hunks)
  • cypher/indexing/vector-index.md (1 hunks)
  • cypher/procedures.md (1 hunks)
  • index.md (1 hunks)
💤 Files with no reviewable changes (1)
  • cypher/indexing.md
🧰 Additional context used
🪛 LanguageTool
cypher/indexing/vector-index.md

[style] ~83-~83: For conciseness, consider replacing this expression with an adverb.
Context: ...d similarity function must be provided. At the moment the only supported similarity functions...

(AT_THE_MOMENT)

cypher/indexing/range-index.md

[style] ~161-~161: This phrase is redundant. Consider using “outside”.
Context: ...th < and <= filters; matching nodes outside of the given radius is performed using con...

(OUTSIDE_OF)

🪛 markdownlint-cli2 (0.18.1)
cypher/indexing/range-index.md

78-78: Multiple spaces after hash on atx style heading

(MD019, no-multiple-space-atx)


79-79: Multiple spaces after hash on atx style heading

(MD019, no-multiple-space-atx)


202-202: Multiple headings with the same content

(MD024, no-duplicate-heading)


203-203: Multiple headings with the same content

(MD024, no-duplicate-heading)


204-204: Multiple spaces after hash on atx style heading

(MD019, no-multiple-space-atx)


204-204: Multiple headings with the same content

(MD024, no-duplicate-heading)


205-205: Multiple spaces after hash on atx style heading

(MD019, no-multiple-space-atx)


206-206: Multiple spaces after hash on atx style heading

(MD019, no-multiple-space-atx)


315-315: Multiple headings with the same content

(MD024, no-duplicate-heading)


318-318: Multiple headings with the same content

(MD024, no-duplicate-heading)


321-321: Multiple headings with the same content

(MD024, no-duplicate-heading)

🔇 Additional comments (6)
cypher/index.md (1)

45-45: Link structure correctly updated for folder navigation.

Adding the trailing slash properly directs users to the new indexing folder structure and the index.md overview page.

index.md (1)

33-33: Homepage links correctly route to dedicated index type documentation.

The three feature links now direct users to focused, type-specific pages rather than anchors in a monolithic page, improving discoverability and user navigation flow.

cypher/indexing/index.md (1)

1-32: Indexing overview page correctly structured as parent with children.

The navigation hierarchy is properly set up with has_children: true and relative links to child pages. The nav_order of 21 positions this appropriately after Procedures. However, verify that this nav_order doesn't introduce unexpected ordering relative to other Cypher Language sections—if the Cypher Language children list feels misaligned after this change, you may want to reorder or adjust neighboring nav_order values.

cypher/indexing/fulltext-index.md (1)

1-317: Full-text index documentation is comprehensive and well-structured.

The page clearly explains RediSearch integration, configuration options (language, stopwords, phonetic), and provides practical examples for both node and relationship indexing. The nav_order: 2 correctly places it as the second index type, and multi-language code samples are consistent with the repository's patterns.

cypher/indexing/vector-index.md (1)

1-201: Vector index documentation is clear and complete.

The page provides excellent coverage of vector index operations, configuration options, and multi-language examples. The reference to /cypher/functions#vector-functions helpfully connects vector creation to the functions reference. nav_order: 3 correctly orders it as the third index type.

cypher/procedures.md (1)

42-44: Procedure cross-references correctly updated to new documentation structure.

The full-text and vector procedure entries now properly link to their dedicated documentation pages instead of anchors in a monolithic file. New vector procedures are properly documented with arguments and yields.

@gkorland gkorland linked an issue Nov 19, 2025 that may be closed by this pull request
@gkorland
Copy link
Contributor

@copilot regarding the full text indexing please review this issue and extend this page accordingly #287

@gkorland
Copy link
Contributor

@copilot please see the two comments from CodeRabbit:

cypher/indexing/range-index.md

-Geospatial indexes can currently only be leveraged with < and <= filters; matching nodes outside of the given radius is performed using conventional matching.
+Geospatial indexes can currently only be leveraged with < and <= filters; matching nodes outside the given radius are matched using conventional traversal.

And

cypher/indexing/vector-index.md

-Please note, when creating a vector index, both the vector dimension and similarity function
-must be provided. At the moment the only supported similarity functions are 'euclidean' or 'cosine'.
+Please note, when creating a vector index, both the vector dimension and similarity function
+must be provided. Currently, the only supported similarity functions are 'euclidean' or 'cosine'.

@gkorland gkorland marked this pull request as ready for review November 19, 2025 12:41
Copy link
Contributor Author

Copilot AI commented Nov 19, 2025

@copilot regarding the full text indexing please review this issue and extend this page accordingly #287

I've extended the full-text search documentation with a new "Query Syntax and Features" section covering:

  • Tokenization: Explains how text is automatically split into searchable words
  • Prefix Matching: Using * wildcard (e.g., Jun* matches "Jungle", "June", "Junior")
  • Fuzzy Matching: Using %term%distance syntax for typo-tolerant searches (e.g., %Jangle%1 matches "Jungle")
  • Combining Features: Boolean operators (AND, OR, NOT) with practical examples

Each feature includes code examples in all supported languages and links to the RediSearch query syntax documentation.

Commit: 3feaabd

Copy link
Contributor Author

Copilot AI commented Nov 19, 2025

@copilot please see the two comments from CodeRabbit:

cypher/indexing/range-index.md
...

Applied both CodeRabbit suggestions:

  1. range-index.md: Changed "matching nodes outside of the given radius is performed using conventional matching" to "matching nodes outside the given radius are matched using conventional traversal"

  2. vector-index.md: Changed "At the moment the only supported similarity functions" to "Currently, the only supported similarity functions"

Commit: bf40cdf

@gkorland
Copy link
Contributor

@copilot please fix the spellchecker

Copy link
Contributor Author

Copilot AI commented Nov 19, 2025

@copilot please fix the spellchecker

Fixed spellchecker by adding missing technical terms to .wordlist.txt:

  • Levenshtein
  • autocomplete
  • tokenization
  • tokenized

Commit: 18df7d6

@gkorland
Copy link
Contributor

@copilot can you please review these comments?

  1. Tradeoffs & Best Practices (Per Index Type)

    • In each subpage, consider adding a “Tradeoffs / when to use” section. For example:

      • Range Index: Discuss write overhead, update cost, storage size.
      • Full-text: Talk about how full-text is good for text-heavy search, but maybe less for numeric filtering or exact-match.
      • Vector: Explain cost of vector index (memory, dimension), approximate vs. exact, and similarity function tradeoffs (cosine vs euclidean).
    • This helps users pick the right index type based on their use case.

  2. Validation Examples

    • On each subpage, show how to verify that the index is being used: e.g., GRAPH.EXPLAIN or GRAPH.PROFILE before and after index creation.
    • For vector indexing: show a realistic db.idx.vector.queryNodes(...) call, then explain what the plan looks like.
  3. Limitations / Supported Data Types

    • For range indexes, explicitly call out supported data types (you already note string, numeric, geospatial in main page). ([FalkorDB Docs][3])

    • For array indexing, mention whether composite arrays (nested arrays, maps) are supported. Actually, the main page does mention that:

      “Complex types like nested arrays, maps, or vectors are not supported for indexing.” ([FalkorDB Docs][3])

    • If on the sub-pages you can reiterate these constraints, it helps users avoid confusion.

  4. Naming / Index Management

    • On each page, consider standardizing how you talk about creating/dropping indexes: e.g., using IF NOT EXISTS (if supported), naming conventions, idempotency. If FalkorDB supports explicitly naming indexes, that might be very useful to document.
    • Also document how to list existing indexes (via a procedure) — maybe link to db.indexes() from the procedures page. ([FalkorDB Docs][2])
  5. Edge Cases & Warnings

    • On vector page: clearly mention that dimension and similarityFunction are required, and warn about choosing correct parameters (M, efConstruction, efRuntime) — maybe provide guidance or defaults.
    • On full-text page: talk about language, stemming, stopwords, phonetic search — if users pick the wrong language config, they might get suboptimal results.
  6. Examples Across Languages

    • Make sure each subpage has code examples in all or most of your major supported languages (Python, JavaScript, Rust, Java, shell). Consistency helps.
    • For full-text and vector calls especially, show how to call the procedures via the client APIs (not just raw Cypher).

Copy link
Contributor Author

Copilot AI commented Nov 19, 2025

@copilot can you please review these comments?

  1. Tradeoffs & Best Practices (Per Index Type)
    ...

I've implemented all suggested improvements across all three index type pages:

Range Index:

  • ✅ Supported data types section with limitations
  • ✅ Validation examples (GRAPH.EXPLAIN before/after index creation)
  • ✅ Performance tradeoffs (write overhead, storage, maintenance costs)
  • ✅ Best practices for when to use range indexes
  • ✅ Index management via db.indexes()
  • ✅ GRAPH.PROFILE examples

Full-text Index:

  • ✅ When to use vs when NOT to use (compared to range indexes)
  • ✅ Configuration best practices (language, stopwords, phonetic)
  • ✅ Performance tradeoffs and costs
  • ✅ Validation examples with GRAPH.EXPLAIN
  • ✅ Language/stemming warnings and recommendations

Vector Index:

  • ✅ Detailed parameter explanations (dimension, M, efConstruction, efRuntime)
  • ✅ Similarity function tradeoffs (cosine vs euclidean)
  • ✅ Memory usage calculations with formula
  • ✅ Realistic vector search examples with embeddings in all languages
  • ✅ Verification examples with GRAPH.EXPLAIN
  • ✅ Troubleshooting section for common issues
  • ✅ Performance tuning guidance

All Pages Include:

  • Validation examples showing index usage
  • Index management (db.indexes() procedure)
  • Comprehensive performance tradeoffs
  • Real-world code examples in all supported languages (Shell, Python, JavaScript, Java, Rust)

Commit: bdc609a

@gkorland
Copy link
Contributor

@copilot please make sure cypher/indexing.html redirects to cypher/indexing/index.md

Copy link
Contributor Author

Copilot AI commented Nov 19, 2025

@copilot please make sure cypher/indexing.html redirects to cypher/indexing/index.md

Added redirect configuration to cypher/indexing/index.md:

  • /cypher/indexing.html/cypher/indexing/
  • /cypher/indexing/cypher/indexing/

This uses the jekyll-redirect-from plugin (already configured in _config.yml) to ensure backward compatibility with the old URL structure.

Commit: 965d37e

@gkorland
Copy link
Contributor

gkorland commented Nov 19, 2025

@copilot it seem the spellcheck still failing

Copy link
Contributor Author

Copilot AI commented Nov 19, 2025

@copilot it seem the spellcheck still failing

Misspelled words:
...

Fixed spellcheck by adding missing technical terms to .wordlist.txt:

  • HNSW (Hierarchical Navigable Small World - vector index algorithm)
  • ANN (Approximate Nearest Neighbor)
  • tradeoff / tradeoffs
  • unnormalized

Commit: 65afa71

@gkorland gkorland merged commit d9f4dbf into main Nov 19, 2025
4 checks passed
@gkorland gkorland deleted the copilot/break-indexing-page-into-subfolders branch November 19, 2025 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend the full-text search docs with examples Break the indexing page to a folder and sub pages per index type

2 participants