Document and validate search scoring implementation using chunk embeddings by Copilot · Pull Request #35 · streed/lil-rag

Copilot · 2025-09-24T17:33:39Z

This PR addresses the requirement that "search scoring should use the distance between the query embedding and the returned chunk's embedding" by documenting and validating that the current implementation already correctly implements this behavior.

Analysis

Upon investigation, the search implementation in SearchWithOptions already correctly uses chunk-level embeddings for distance calculation:

vec_distance_cosine(e.embedding, ?) as distance
FROM chunks c
JOIN embeddings e ON c.chunk_id = e.chunk_id

The system:

Calculates cosine distance between query embedding and each chunk's embedding
Converts distance to score using score = 1.0 - distance
Operates at chunk-level (not document-level) for precise matching
Properly sorts results by score in descending order

Changes Made

Enhanced Code Documentation

Added detailed comments explaining the chunk embedding distance calculation in the SQL query
Clarified the score conversion formula with examples of distance-to-score mapping

Comprehensive Testing

Added TestSQLiteStorage_SearchScoring_ChunkEmbeddingDistance test that validates:
- Identical embeddings achieve score ≈ 1.0
- Similar embeddings get high scores (0.9+)
- Orthogonal embeddings get score ≈ 0.0
- Opposite embeddings get negative scores
- Results are properly sorted by semantic similarity

Documentation

Created SEARCH_SCORING.md explaining the implementation details, score interpretation, and performance characteristics
Documented the chunk-level vs document-level approach

Validation

The implementation was tested with known embedding vectors to verify mathematical correctness:

Query: [1.0, 0.0, 0.0]
Results:
1. exact_match [1.0, 0.0, 0.0]: score=1.000000 ✓
2. close_match [0.9, 0.1, 0.0]: score=0.993884 ✓  
3. orthogonal [0.0, 1.0, 0.0]: score=0.000000 ✓
4. opposite [-1.0, 0.0, 0.0]: score=-1.000000 ✓

All existing tests continue to pass, confirming no regressions were introduced.

Conclusion

The search scoring implementation was already correct and meeting the specified requirements. This PR adds explicit documentation, validation, and clarity to make the chunk embedding distance calculation behavior transparent and well-tested.

💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Copilot

Copilot wasn't able to review any files in this pull request.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Co-authored-by: streed <805140+streed@users.noreply.github.com>

…dings Co-authored-by: streed <805140+streed@users.noreply.github.com>

Initial plan

0eee9ad

Copilot AI assigned Copilot and streed Sep 24, 2025

Copilot started work on behalf of streed September 24, 2025 17:33 View session

streed marked this pull request as ready for review September 24, 2025 17:38

Copilot AI review requested due to automatic review settings September 24, 2025 17:38

Copilot AI reviewed Sep 24, 2025

View reviewed changes

Copilot AI and others added 2 commits September 24, 2025 17:38

Initial analysis of search scoring implementation

c94d5d7

Co-authored-by: streed <805140+streed@users.noreply.github.com>

Document and validate search scoring implementation using chunk embed…

d3ee5fb

…dings Co-authored-by: streed <805140+streed@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] The search scoring should use the distance between the query embedding and the returned chunk's embedding.~~ Document and validate search scoring implementation using chunk embeddings Sep 24, 2025

Copilot AI requested a review from streed September 24, 2025 17:45

Copilot finished work on behalf of streed September 24, 2025 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document and validate search scoring implementation using chunk embeddings#35

Document and validate search scoring implementation using chunk embeddings#35
Copilot wants to merge 3 commits intomainfrom
copilot/fix-a6acafd1-e01c-4f8b-a8bb-c4ce81807437

Copilot AI commented Sep 24, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Analysis

Changes Made

Enhanced Code Documentation

Comprehensive Testing

Documentation

Validation

Conclusion

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Sep 24, 2025 •

edited

Loading