Skip to content

Commit e0b3957

Browse files
Apply suggestions from code review
Co-authored-by: KarthikSubbarao <[email protected]> Signed-off-by: zackcam <[email protected]>
1 parent 8da6816 commit e0b3957

File tree

1 file changed

+30
-46
lines changed

1 file changed

+30
-46
lines changed

topics/bloomfilters.md

Lines changed: 30 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -60,25 +60,12 @@ In this username example, we can use a Bloom filter to track every username that
6060

6161
## Tutorial
6262

63-
To use bloom filters with Valkey, you need to:
63+
**There are two ways to run valkey-bloom:**
64+
- **Docker**: Follow the steps here for using the [valkey bundle](valkey-bundle.md)
65+
- **Build from source**: Follow the [build instructions](https://github.com/valkey-io/valkey-bloom/blob/unstable/README.md#build-instructions)
6466

65-
1. **Install Valkey**: Follow the [installation guides](installation.md) to install Valkey on your system.
6667

67-
2. **Get the valkey-bloom module**: You have three options:
68-
- Use the Docker image: [valkey-extensions](https://hub.docker.com/r/valkey/valkey-bundle)
69-
- Download a pre-built binary from the [releases page](https://github.com/valkey-io/valkey-bloom/releases)
70-
- Build from source by following the [build instructions](https://github.com/valkey-io/valkey-bloom/blob/unstable/README.md#build-instructions)
71-
72-
Once valkey-bloom is built, you can run the Valkey server with the module loaded in two different ways, on server start up:
73-
```bash
74-
./valkey-server --loadmodule ./target/release/libvalkey_bloom.so
75-
```
76-
You can also load the valkey bloom module on an already running server by running the following command in [valkey-cli](cli.md):
77-
```bash
78-
127.0.0.1:6379> MODULE LOAD /path/to/libvalkey_bloom.so
79-
```
80-
81-
### Usage example
68+
### Usage example with CLI
8269

8370
* Create a bloom filter of taken usernames with a 1 in 1000 chance of false positives and 10000 initial capacity.
8471
```bash
@@ -113,12 +100,12 @@ OK
113100
3) (integer) 1
114101
```
115102

116-
* Get how many users have been created
103+
* Check how many users have been added to the filter
117104
```bash
118105
127.0.0.1:6379> BF.CARD usernames
119106
(integer) 4
120107
```
121-
* Get the information surrounding your bloom filter of users
108+
* View your bloom filter object's usage:
122109
```bash
123110
127.0.0.1:6379> BF.INFO usernames
124111
1) Capacity
@@ -138,7 +125,7 @@ OK
138125
15) Max scaled capacity
139126
16) (integer) 26214300
140127
```
141-
* If you want to limit the number of people who can sign up you can create a non scaling filter. This will create a filter than can only have 1000 items added with a 1 in 10,000 chance of a false positive.
128+
* If you know the exact number of users who will sign up, or if want to limit the number of items regardless, you can create a non scaling bloom filter. This will create a filter that has capacity for only 1000 items with a 1 in 10,000 chance of a false positive.
142129
```bash
143130
127.0.0.1:6379> BF.RESERVE limited_users 0.00001 1000 NONSCALING
144131
OK
@@ -227,27 +214,27 @@ Example of default bloom filter information:
227214

228215
### When to adjust default configurations
229216

230-
Adjusting the default configurations can be beneficial in several scenarios:
217+
Adjusting the default configurations below can be used to update the behavior of the default bloom object created and are beneficial in different scenarios:
231218

232-
- **Higher capacity (bf.bloom-capacity)**: Increase this value when you expect to store many items in most of your bloom filters. This reduces the need for scaling operations which can improve performance.
219+
- **Higher capacity (bf.bloom-capacity)**: Increase this value when you expect to store a larger number of items in all your new bloom filter objects. It increases the overall capacity for non scaling filters. For scaling filters, it increases the starting/initial capacity which reduces the need for additional scaling operations which can improve performance.
233220

234-
- **Lower false positive rate (bf.bloom-fp-rate)**: Decrease this value when accuracy is critical for your application. For example, in fraud detection or security applications where false positives are costly.
221+
- **Lower false positive rate (bf.bloom-fp-rate)**: Decrease this value when correctness is critical for your application and you want to reduce the number of false positives in all your new bloom filter objects. For example, in fraud detection or security applications where false positives are costly.
235222

236-
- **Higher expansion rate (bf.bloom-expansion)**: Increase this value when you want faster growth of bloom filters that need to scale. This reduces the number of scaling operations but uses more memory quicker.
223+
- **Higher expansion rate (bf.bloom-expansion)**: Increase this value when you want faster growth of bloom filters that will scale across all your new scaling bloom filter objects. This reduces the number of scaling operations but uses more memory quicker.
237224

238-
- **Lower tightening ratio (bf.bloom-tightening-ratio)**: Adjust this when you want to maintain a more consistent false positive rate across multiple scaling operations. (Not advisable to change this default)
225+
- **Lower tightening ratio (bf.bloom-tightening-ratio)**: Lower this when you want to maintain a more consistent false positive rate across multiple scaling operations for all your new bloom filter objects. (Not advisable to change this default, as it is already set to a strict value of 0.5)
239226

240-
- **Random seed (bf.bloom-use-random-seed)**: Set to false only when you need deterministic behavior for testing or reproducibility.
227+
- **Fixed seed (bf.bloom-use-random-seed)**: Set to false only when you want the new bloom filter object creations to use a fixed seed for deterministic occurrence of false positives during item add/exists operations. It can be used for testing or reproducibility.
241228

242-
You can modify these default values using the CONFIG SET command:
229+
You can modify these default values using the CONFIG SET command. However, note that the effect of the configuration change is only applicable to the bloom objects created after the configuration change where the property is not specified through the command already.
243230

244231
Example usage of changing all the different properties:
245232
```bash
246-
CONFIG SET bf.bloom-fp-rate 0.001
247-
CONFIG SET bf.bloom-capacity 1000
248-
CONFIG SET bf.bloom-expansion 4
249-
CONFIG SET bf.bloom-tightening-ratio 0.6
250-
CONFIG SET bf.bloom-use-random-seed false
233+
127.0.0.1:6379> CONFIG SET bf.bloom-fp-rate 0.001
234+
127.0.0.1:6379> CONFIG SET bf.bloom-capacity 1000
235+
127.0.0.1:6379> CONFIG SET bf.bloom-expansion 4
236+
127.0.0.1:6379> CONFIG SET bf.bloom-tightening-ratio 0.6
237+
127.0.0.1:6379> CONFIG SET bf.bloom-use-random-seed false
251238
```
252239

253240
### Memory usage limit
@@ -256,16 +243,13 @@ The `bf.bloom-memory-usage-limit` configuration (default 128MB) controls the max
256243

257244
Example usage of increasing the limit:
258245
```bash
259-
CONFIG SET bf.bloom-memory-usage-limit 256mb
246+
127.0.0.1:6379> CONFIG SET bf.bloom-memory-usage-limit 268435456
260247
```
261248

262-
This setting is particularly important for production environments for several reasons:
263-
264-
- **Resource protection**: Prevents a single bloom filter from consuming excessive memory
265-
- **Denial of service prevention**: Protects against attacks that might try to create enormous filters
266-
- **Predictable scaling**: Ensures bloom filters have a known upper bound on resource usage
249+
Having a limit on the max memory per bloom object prevents them from growing unbounded, thus maintaining server performance during serialization.
250+
If your use case requires bloom filters beyond this limit, you can increase this configuration value.
267251

268-
If your use case requires exceptionally large bloom filters, you can increase this limit. However, be aware that very large bloom filters might impact overall system performance and memory availability for other operations.
252+
However, be aware it can impact overall system performance and memory availability for other operations.
269253

270254
When a bloom filter reaches this memory limit, any operation that would cause it to exceed the limit will fail with an error message indicating that the memory limit would be exceeded.
271255

@@ -336,6 +320,13 @@ There are two notable validations bloom filters faces.
336320

337321
The memory usage limit per bloom filter by default is defined by the `BF.BLOOM-MEMORY-USAGE-LIMIT` module configuration which has a default value of 128 MB. If a command results in a creation / scale out causing the overall memory usage to exceed this limit, the command is rejected. This config is modifiable and can be increased as needed.
338322

323+
The `BF.INFO` command's `SIZE` field can be used to find out the current size of a bloom filter.
324+
325+
```bash
326+
172.31.45.25:6379> BF.INFO validate_scale_valid SIZE
327+
(integer) 384
328+
```
329+
339330
2. Number of sub filters (in case of scalable bloom filters):
340331

341332
When a bloom filter scales out, a new sub filter is added. The limit on the number of sub filters depends on the false positive rate and tightening ratio. Each sub filter has a stricter false positive, and this is controlled by the tightening ratio. If a command attempting a scale out results in the sub filter reaching a false positive of 0, the command is rejected.
@@ -360,10 +351,3 @@ The `BF.INFO` command's `MAXSCALEDCAPACITY` field can be used to find out the ma
360351
127.0.0.1:6379> BF.INFO validate_scale_valid MAXSCALEDCAPACITY
361352
(integer) 26214300
362353
```
363-
364-
The `BF.INFO` command's `SIZE` field can be used to find out the current size of a bloom filter.
365-
366-
```bash
367-
172.31.45.25:6379> BF.INFO validate_scale_valid SIZE
368-
(integer) 384
369-
```

0 commit comments

Comments
 (0)