Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MaxHeightSplay for the splay tree to improve its performance by reducing its skewness. #957

Closed
wants to merge 5 commits into from

Conversation

m4ushold
Copy link
Contributor

@m4ushold m4ushold commented Aug 9, 2024

What this PR does / why we need it:

Why this is needed:
Currently, the crdt.Text data structure uses a Splay tree. While Splay trees are efficient for performing operations in a continuous range, they have a downside where consecutively inserted elements may become linearly arranged in the tree, leading to a skew tree. When performing M operations on a tree with N nodes, the performance is generally guaranteed to be O((N+M) log N). However, if the tree becomes skewed, each operation might initially take O(N) time, though it could eventually improve. This skewness can degrade the performance of the crdt.Text data structure, making it crucial to explore ways to prevent this skewness and maintain the tree's performance and efficiency. This PR aims to address the performance and efficiency degradation in the crdt.Text data structure caused by skewness in the Splay tree.

What this PR does:
In this Pull Request, we introduce a method called max_height_splay to reduce the skewness. This method involves finding the deepest leaf node and performing a splay operation on it every √n operations, where n is the number of splay operations performed. The time complexity of max_height_splay is also amortized O(log n), similar to other operations. A POC for this approach was implemented in C++, and the code can be found here.

Which issue(s) this PR fixes:

Fixes #941

Special notes for your reviewer:
If operations are primarily occurring at the end of the document, performance degradation might still occur. Although this may require further research, it seems worthwhile to implement as it can provide good performance in most cases.
I have added test code but did not include benchmark code. If you think it's needed, please mention me! And while adding the test code, I needed to check the height of the Splay tree nodes, which led to some modifications in other test code as well.

Does this PR introduce a user-facing change?:

NONE

Additional documentation:


Checklist:

  • Added relevant tests or not required
  • Didn't break anything

Summary by CodeRabbit

  • New Features

    • Enhanced Splay tree functionality with improved balancing mechanisms.
    • Added height management for nodes to optimize performance on tree operations.
  • Bug Fixes

    • Updated expected output formats in tests to reflect accurate representations of arrays and tree structures.
  • Tests

    • Expanded test cases to validate splay tree behavior with new height management features and updated node representations.
    • Improved fidelity of benchmark tests for the Document structure’s array operations.

To reduce the skewness of the splay tree, perform a MaxHeightSplay every
sqrt(n) regular splays. To implement it, height attribute added to Node struct
and splayCount and operationCount is added to Tree struct.
To test max heigt splay, ToTestString is updated to include node's height.
Tests using ToTestString is updated and test for max height splay is added.
Copy link

coderabbitai bot commented Aug 9, 2024

Walkthrough

The changes optimize the Splay tree implementation by incorporating enhancements to prevent skewness, thereby improving performance in the crdt.Text structure. Key updates include the addition of a height field to nodes and optimized splay operations that adapt based on the number of operations performed, ensuring a balanced tree structure. Benchmark tests and assertions are also updated for accuracy, reflecting the new output representations for the Document structure.

Changes

Files Change Summary
pkg/document/document_test.go, test/bench/document_bench_test.go Updated assertions for ToTestString() method outputs in both tests to reflect new array formats.
pkg/splay/splay.go, pkg/splay/splay_test.go Enhanced Node and Tree structures with height management, optimized splay operations, and updated test assertions to match new output formats.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant SplayTree
    participant Node

    User->>SplayTree: Insert Node
    SplayTree->>Node: Create NewNode
    Node->>Node: Initialize Height
    SplayTree->>SplayTree: Update Height
    SplayTree->>User: Return Tree Structure
Loading

Assessment against linked issues

Objective Addressed Explanation
Improve performance of Splay Tree to prevent skewness (##941)

Poem

In the forest where trees sway,
A splay tree danced today.
With heights anew and branches wide,
It skips the skew and takes in stride.
A hop, a leap, in code we trust,
Performance shines, it's a must! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 80c6ea0 and b97f574.

Files selected for processing (3)
  • pkg/document/document_test.go (1 hunks)
  • pkg/splay/splay.go (11 hunks)
  • pkg/splay/splay_test.go (6 hunks)
Additional comments not posted (14)
pkg/splay/splay_test.go (4)

159-181: Add comprehensive checks in the "max height splay test".

The new test case for max height splay is a valuable addition. Ensure that it comprehensively checks the tree's behavior and structure after each insertion.


59-65: Update assertions to reflect new node representation.

The assertions have been updated to include an additional index in the node representation. Ensure that these changes correctly reflect the updated logic in the ToTestString() method.

Verification successful

Assertions correctly reflect the updated node representation.

The ToTestString method implementation confirms that the node representation includes weight, length, height, and value, matching the updated test assertions. No further action is needed.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the correctness of the updated node representation in the ToTestString method.

# Test: Check the ToTestString method implementation. Expect: The method includes the new index logic.
ast-grep --lang go --pattern $'func (t *Tree[$_] ToTestString() string {
  $$$
}'

Length of output: 690


Line range hint 124-146: Review range deletion tests for updated node representation.

The range deletion tests have been updated to reflect the new node representation. Ensure that these changes are consistent with the updated tree logic.

Verification successful

Verification successful: Node representation in range deletion tests is consistent with updated tree logic.

The ToTestString method in pkg/splay/splay.go constructs node representations using weight, value length, height, and value string, which matches the format expected in the test assertions. The changes in the range deletion tests reflect this updated logic correctly.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the correctness of the updated node representation in the range deletion tests.

# Test: Check the ToTestString method implementation. Expect: The method includes the new index logic.
ast-grep --lang go --pattern $'func (t *Tree[$_] ToTestString() string {
  $$$
}'

Length of output: 690


95-104: Ensure correctness of deletion tests with updated node representation.

The test cases for deletion now expect a new node representation. Verify that these changes align with the updated logic in the tree implementation.

Verification successful

Deletion Tests Correctly Verify Updated Node Representation

The deletion tests in pkg/splay/splay_test.go correctly use the updated node representation format in their assertions, ensuring alignment with the ToTestString method. The tests verify both the tree's string representation and node indices after deletions, confirming correctness.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the correctness of the updated node representation in the deletion tests.

# Test: Check the ToTestString method implementation. Expect: The method includes the new index logic.
ast-grep --lang go --pattern $'func (t *Tree[$_] ToTestString() string {
  $$$
}'

Length of output: 690


Script:

#!/bin/bash
# Description: Check the deletion test cases to ensure they expect the correct node representation format.

# Locate the test cases for deletion in the splay_test.go file.
rg 'func Test.*Deletion' -A 20 pkg/splay/splay_test.go

Length of output: 56


Script:

#!/bin/bash
# Description: Search for all test functions in the splay_test.go file to find those related to deletion.

# List all test functions in the splay_test.go file.
rg '^func Test' -A 20 pkg/splay/splay_test.go

Length of output: 845


Script:

#!/bin/bash
# Description: Search for any usage of a Delete method or similar within the splay_test.go file to find deletion-related tests.

# Search for the Delete method or similar operations in the test file.
rg 'Delete' pkg/splay/splay_test.go

Length of output: 302


Script:

#!/bin/bash
# Description: Search for assertions following deletion operations to verify alignment with the updated node representation.

# Search for assertions in the context of Delete operations.
rg 'Delete' -A 5 pkg/splay/splay_test.go | rg 'assert\.Equal'

Length of output: 337

pkg/splay/splay.go (9)

40-40: Add height field to Node struct for balancing operations.

The addition of the height field is crucial for implementing balancing operations. Ensure that this field is correctly initialized and updated throughout the code.


50-51: Initialize height in NewNode function.

The NewNode function now initializes the height to 1. Verify that this initialization is consistent with the intended logic for new nodes.


57-74: Implement leftHeight, rightHeight, and InitHeight methods.

These methods are essential for managing node heights. Ensure they are correctly implemented and used in the tree operations.


117-119: Introduce operation counters in Tree struct.

The splayCount and operationCount fields are introduced to manage splay operations. Ensure these counters are correctly incremented and reset.


155-156: Update height after insertion.

The UpdateHeight method is called after node insertion to maintain correct height values. Ensure this update is correctly integrated into the insertion logic.


Line range hint 161-206: Separate InternalSplay from public Splay method.

The separation of InternalSplay from Splay helps manage internal and public operations distinctly. Ensure that this separation is effectively utilized in the codebase.


197-206: Implement MaxHeightSplay logic in Splay method.

The Splay method now includes logic to perform a MaxHeightSplay every √n operations. Verify that this logic is correctly implemented and aligns with the performance improvement objectives.


208-220: Implement MaxHeightSplay to reduce skewness.

The MaxHeightSplay method finds and splays the deepest node to reduce skewness. Ensure that the logic effectively identifies and splays the correct node.


322-334: Ensure UpdateHeight method correctly recalculates node height.

The UpdateHeight method recalculates node height based on children. Verify that this method is invoked appropriately throughout the tree operations.

pkg/document/document_test.go (1)

135-150: Update assertions to reflect new array representation.

The assertions for array operations have been updated to include additional indices in the representation. Ensure that these changes correctly reflect the updated logic in the ToTestString() method.

Verification successful

Ensure test assertions match the ToTestString format.

The ToTestString method in pkg/splay/splay.go outputs a string with node weight, value length, height, and value string. Verify that the test assertions in pkg/document/document_test.go reflect this format accurately. Adjust the test outputs if necessary to align with the method's logic.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the correctness of the updated array representation in the ToTestString method.

# Test: Check the ToTestString method implementation. Expect: The method includes the new index logic.
ast-grep --lang go --pattern $'func (t *Tree[$_] ToTestString() string {
  $$$
}'

Length of output: 690

I had to update the test code because the ToTest String had to output including
 the height of the display tree.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between b97f574 and 68ceac1.

Files selected for processing (1)
  • test/bench/document_bench_test.go (1 hunks)
Additional comments not posted (4)
test/bench/document_bench_test.go (4)

145-145: Verify the correctness of ToTestString() output.

Ensure that the expected output string accurately reflects the internal state of the array after the operations. The indices and values should match the operations performed.

Verification successful

Verification of ToTestString() output is consistent with expected behavior.

The ToTestString() function is used to output a string containing the metadata of the elements for debugging purposes. The expected output in the test assertion appears to accurately reflect the internal state of the array, including indices and values, as intended by the function. No issues found with the test assertion.

  • The ToTestString() function is defined in pkg/document/crdt/array.go and outputs metadata for debugging.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the correctness of the expected output for `ToTestString()`.

# Test: Search for `ToTestString()` implementations to understand its output format.
ast-grep --lang go --pattern $'func ToTestString() string { $$$ }'

Length of output: 67


Script:

#!/bin/bash
# Description: Search for the definition or documentation of `ToTestString()` using ripgrep.

# Search for `ToTestString` in the codebase to locate its definition or related comments.
rg 'ToTestString' --context 5

Length of output: 33501


160-160: Verify ToTestString() output after second insertion.

Check that the expected output string correctly represents the array's state after the second insertion operation. The indices and values should be consistent with the operations performed.


155-155: Verify ToTestString() output after insertion.

Ensure that the expected output string accurately reflects the array's state after an integer is inserted. The indices and values should match the insertion operation.


150-150: Verify ToTestString() output after deletion.

Check that the expected output string correctly represents the array's state after an element is deleted. The indices and values should be consistent with the deletion operation.

@hackerwins hackerwins requested a review from raararaara August 9, 2024 07:28
If the number of operations is less than 10,000, performing max_height_splay
too frequently can degrade performance. To prevent this, only perform
max_height_splay when the number of operations exceeds 10,000.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 68ceac1 and 23c9f48.

Files selected for processing (1)
  • pkg/splay/splay.go (11 hunks)
Files skipped from review as they are similar to previous changes (1)
  • pkg/splay/splay.go

@krapie krapie added the enhancement 🌟 New feature or request label Aug 9, 2024
The test cases have been updated due to changes in max-height splay, which is
no longer performed when the operation count is 10,000 or less.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 23c9f48 and ab5a359.

Files selected for processing (1)
  • pkg/splay/splay_test.go (6 hunks)
Files skipped from review as they are similar to previous changes (1)
  • pkg/splay/splay_test.go

@m4ushold
Copy link
Contributor Author

I’ve decided to close this PR as further research is required. The code applying max height splay caused a stack overflow during benchmarking, and I need to analyze the cause in more detail.

In addition, I found a test case that works inefficiently because the new implementation method calls the max height display function first and then the display.

Given these issues, I see a strong need for benchmarking code that can compare the new implementation with the existing one. I will prioritize this task first, and once all concerns are fully resolved, I will submit a new PR.

@m4ushold m4ushold closed this Aug 10, 2024
@m4ushold m4ushold mentioned this pull request Sep 1, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🌟 New feature or request
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Improve performance of Splay Tree in crdt.Text data structure to prevent skewness
2 participants