Skip to content

Creating a getting started guide for bloom #310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

zackcam
Copy link
Contributor

@zackcam zackcam commented Jun 9, 2025

Adding a getting started guide for bloom filters

This will include the following:
- Manual Building / installation (Valkey Server start up + Module Load)
- Valkey Extension Docker based startup
- Examples of executing bloom commands
- Snippet of executing bloom commands using a python script with valkey-py
- Snippet / examples of using bloom configurations

We will need to decide where we will show getting started guides as currently they are quite hidden in documentation. But for now I have put under tutorials and FAQ. If we want a more explicit/obvious place let me know and i will look at making that change

After discussion we have decided to move the getting started information from a separate page.
The sections we will now include is a tutorial, and a better explanation of the configs for bloom filter. We will save the extension and python example for separate pages that are yet to be created. When they are created I can add the sections from before to them.

@zackcam
Copy link
Contributor Author

zackcam commented Jun 9, 2025

The two spell check errors I think aren't correct: reproducibility is a word and spelled correctly, while pre-built should be hyphenated. If people agree I will add them to the excluded words list

@zackcam zackcam force-pushed the main branch 2 times, most recently from 280a110 to 64d7855 Compare June 12, 2025 18:26
@zackcam zackcam force-pushed the main branch 2 times, most recently from fe1feb8 to 3c141be Compare June 18, 2025 20:03
@@ -58,6 +58,97 @@ In this username example, we can use a Bloom filter to track every username that
* If no, the user is created and the username is added to the Bloom filter.
* If yes, the app can decide to either check the main database or reject the username.

## Tutorial
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this is at the middle of the page? We could move this to the bottom most section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to follow how sets had it because they have it on top of performance and limits. But also wanted to have it after examples because that way the tutorial example makes more sense and provides context on that

Comment on lines 255 to 270
The `bf.bloom-memory-usage-limit` configuration (default 128MB) controls the maximum memory that a single bloom filter can use:

Example usage of increasing the limit:
```bash
CONFIG SET bf.bloom-memory-usage-limit 256mb
```

This setting is particularly important for production environments for several reasons:

- **Resource protection**: Prevents a single bloom filter from consuming excessive memory
- **Denial of service prevention**: Protects against attacks that might try to create enormous filters
- **Predictable scaling**: Ensures bloom filters have a known upper bound on resource usage

If your use case requires exceptionally large bloom filters, you can increase this limit. However, be aware that very large bloom filters might impact overall system performance and memory availability for other operations.

When a bloom filter reaches this memory limit, any operation that would cause it to exceed the limit will fail with an error message indicating that the memory limit would be exceeded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `bf.bloom-memory-usage-limit` configuration (default 128MB) controls the maximum memory that a single bloom filter can use:
Example usage of increasing the limit:
```bash
CONFIG SET bf.bloom-memory-usage-limit 256mb
```
This setting is particularly important for production environments for several reasons:
- **Resource protection**: Prevents a single bloom filter from consuming excessive memory
- **Denial of service prevention**: Protects against attacks that might try to create enormous filters
- **Predictable scaling**: Ensures bloom filters have a known upper bound on resource usage
If your use case requires exceptionally large bloom filters, you can increase this limit. However, be aware that very large bloom filters might impact overall system performance and memory availability for other operations.
When a bloom filter reaches this memory limit, any operation that would cause it to exceed the limit will fail with an error message indicating that the memory limit would be exceeded.
The `bf.bloom-memory-usage-limit` configuration (default 128MB) controls the maximum memory that a single bloom filter object can use:
Example usage of increasing the limit:
CONFIG SET bf.bloom-memory-usage-limit 256mb
Having a limit on the max memory per bloom object prevents them from growing unbounded, thus maintaining server performance during serialization.
If your use case requires bloom filters beyond this limit, you can increase this configuration value.
However, be aware it can impact overall system performance and memory availability for other operations.
When a bloom filter reaches this memory limit, any operation that would cause it to exceed the limit will fail with an error message indicating that the memory limit would be exceeded.

Comment on lines 364 to 368
The `BF.INFO` command's `SIZE` field can be used to find out the current size of a bloom filter.

```bash
172.31.45.25:6379> BF.INFO validate_scale_valid SIZE
(integer) 384
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this under the "1. Memory Usage:" point?

```
* If you anticipate once initial users have signed up after a while you will get a large increase in users you can set a custom expansion rate.
```bash
127.0.0.1:6379> BF.RESERVE expanding_users 0.001 10000 EXPANSION 4

This comment was marked as resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to show changing the expansion being not the default value, I'm happy to change this to 2 either way though if this doesn't make sense

@zackcam zackcam force-pushed the main branch 4 times, most recently from f5d3cf1 to cfd27d1 Compare June 25, 2025 19:12
@zackcam
Copy link
Contributor Author

zackcam commented Jun 25, 2025

Just a note as I am linking to the extensions page, that pr will need to be finished first

I also plan on creating a follow up PR to fix Bloom filter capitalization consistency, adding here would make reviewing much harder


### Setup

1. **Install Valkey**: Follow the [installation guides](installation.md) to set up Valkey on your system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? it is not needed if someone is using Docker, right?

If you agree, lets remove this

The `BF.INFO` command's `SIZE` field can be used to find out the current size of a bloom filter.

```bash
172.31.45.25:6379> BF.INFO validate_scale_valid SIZE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we swap this with 127.0.0.1? :D

Co-authored-by: KarthikSubbarao <[email protected]>
Signed-off-by: zackcam <[email protected]>
@KarthikSubbarao
Copy link
Member

KarthikSubbarao commented Jul 2, 2025

Thank you @zackcam . This PR looks good to me and I approved it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants