Skip to content

Commit a843b73

Browse files
Apply suggestions from code review
Co-authored-by: KarthikSubbarao <[email protected]> Signed-off-by: zackcam <[email protected]>
1 parent 8da6816 commit a843b73

File tree

1 file changed

+37
-48
lines changed

1 file changed

+37
-48
lines changed

topics/bloomfilters.md

Lines changed: 37 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -60,27 +60,14 @@ In this username example, we can use a Bloom filter to track every username that
6060

6161
## Tutorial
6262

63-
To use bloom filters with Valkey, you need to:
63+
**There are two ways to run valkey-bloom:**
64+
- **Docker**: Follow the steps here for using the [valkey bundle](valkey-bundle.md)
65+
- **Build from source**: Follow the [build instructions](https://github.com/valkey-io/valkey-bloom/blob/unstable/README.md#build-instructions)
6466

65-
1. **Install Valkey**: Follow the [installation guides](installation.md) to install Valkey on your system.
6667

67-
2. **Get the valkey-bloom module**: You have three options:
68-
- Use the Docker image: [valkey-extensions](https://hub.docker.com/r/valkey/valkey-bundle)
69-
- Download a pre-built binary from the [releases page](https://github.com/valkey-io/valkey-bloom/releases)
70-
- Build from source by following the [build instructions](https://github.com/valkey-io/valkey-bloom/blob/unstable/README.md#build-instructions)
68+
### Usage example with CLI
7169

72-
Once valkey-bloom is built, you can run the Valkey server with the module loaded in two different ways, on server start up:
73-
```bash
74-
./valkey-server --loadmodule ./target/release/libvalkey_bloom.so
75-
```
76-
You can also load the valkey bloom module on an already running server by running the following command in [valkey-cli](cli.md):
77-
```bash
78-
127.0.0.1:6379> MODULE LOAD /path/to/libvalkey_bloom.so
79-
```
80-
81-
### Usage example
82-
83-
* Create a bloom filter of taken usernames with a 1 in 1000 chance of false positives and 10000 initial capacity.
70+
* Create a bloom filter to track the usernames which are taken with a 1 in 1000 chance of a false positive and 10000 as the initial capacity.
8471
```bash
8572
127.0.0.1:6379> BF.RESERVE usernames 0.001 10000
8673
OK
@@ -113,12 +100,12 @@ OK
113100
3) (integer) 1
114101
```
115102

116-
* Get how many users have been created
103+
* Check how many users have been added to the filter
117104
```bash
118105
127.0.0.1:6379> BF.CARD usernames
119106
(integer) 4
120107
```
121-
* Get the information surrounding your bloom filter of users
108+
* View your bloom filter object's usage:
122109
```bash
123110
127.0.0.1:6379> BF.INFO usernames
124111
1) Capacity
@@ -138,12 +125,12 @@ OK
138125
15) Max scaled capacity
139126
16) (integer) 26214300
140127
```
141-
* If you want to limit the number of people who can sign up you can create a non scaling filter. This will create a filter than can only have 1000 items added with a 1 in 10,000 chance of a false positive.
128+
* If you know the exact number of users who will sign up, or if want to limit the number of items regardless, you can create a non scaling bloom filter. This will create a filter that has capacity for only 1000 items with a 1 in 100,000 chance of a false positive.
142129
```bash
143130
127.0.0.1:6379> BF.RESERVE limited_users 0.00001 1000 NONSCALING
144131
OK
145132
```
146-
* If you anticipate once initial users have signed up after a while you will get a large increase in users you can set a custom expansion rate.
133+
* If you anticipate that, once initial users have signed up, you will get a large increase in users after a while, you can set a custom expansion rate.
147134
```bash
148135
127.0.0.1:6379> BF.RESERVE expanding_users 0.001 10000 EXPANSION 4
149136
OK
@@ -227,27 +214,27 @@ Example of default bloom filter information:
227214

228215
### When to adjust default configurations
229216

230-
Adjusting the default configurations can be beneficial in several scenarios:
217+
Adjusting the default configurations below can be used to update the behavior of the default bloom object created and are beneficial in different scenarios:
231218

232-
- **Higher capacity (bf.bloom-capacity)**: Increase this value when you expect to store many items in most of your bloom filters. This reduces the need for scaling operations which can improve performance.
219+
- **Higher capacity (bf.bloom-capacity)**: Increase this value when you expect to store a larger number of items in all your new bloom filter objects. It increases the overall capacity for non scaling filters. For scaling filters, it increases the starting/initial capacity which potentially reduces the need for additional scaling operations which can improve performance.
233220

234-
- **Lower false positive rate (bf.bloom-fp-rate)**: Decrease this value when accuracy is critical for your application. For example, in fraud detection or security applications where false positives are costly.
221+
- **Lower false positive rate (bf.bloom-fp-rate)**: Decrease this value when correctness is critical for your application and you want to reduce the number of false positives in all your new bloom filter objects. For example, in fraud detection or security applications where false positives are costly.
235222

236-
- **Higher expansion rate (bf.bloom-expansion)**: Increase this value when you want faster growth of bloom filters that need to scale. This reduces the number of scaling operations but uses more memory quicker.
223+
- **Higher expansion rate (bf.bloom-expansion)**: Increase this value when you want faster growth of bloom filters that will scale across all your new scaling bloom filter objects. This reduces the number of scaling operations but uses more memory quicker.
237224

238-
- **Lower tightening ratio (bf.bloom-tightening-ratio)**: Adjust this when you want to maintain a more consistent false positive rate across multiple scaling operations. (Not advisable to change this default)
225+
- **Lower tightening ratio (bf.bloom-tightening-ratio)**: Lower this when you want to maintain a more consistent false positive rate across multiple scaling operations for all your new bloom filter objects. (Not advisable to change this default, as it is already set to a strict value of 0.5)
239226

240-
- **Random seed (bf.bloom-use-random-seed)**: Set to false only when you need deterministic behavior for testing or reproducibility.
227+
- **Fixed seed (bf.bloom-use-random-seed)**: Set to false only when you want the new bloom filter object creations to use a fixed seed for deterministic occurrence of false positives during item add/exists operations. It can be used for testing or reproducibility.
241228

242-
You can modify these default values using the CONFIG SET command:
229+
You can modify these default values using the CONFIG SET command. However, note that the effect of the configuration change is only applicable to the bloom objects created after the configuration change where the property is not specified through the command already.
243230

244231
Example usage of changing all the different properties:
245232
```bash
246-
CONFIG SET bf.bloom-fp-rate 0.001
247-
CONFIG SET bf.bloom-capacity 1000
248-
CONFIG SET bf.bloom-expansion 4
249-
CONFIG SET bf.bloom-tightening-ratio 0.6
250-
CONFIG SET bf.bloom-use-random-seed false
233+
127.0.0.1:6379> CONFIG SET bf.bloom-fp-rate 0.001
234+
127.0.0.1:6379> CONFIG SET bf.bloom-capacity 1000
235+
127.0.0.1:6379> CONFIG SET bf.bloom-expansion 4
236+
127.0.0.1:6379> CONFIG SET bf.bloom-tightening-ratio 0.6
237+
127.0.0.1:6379> CONFIG SET bf.bloom-use-random-seed false
251238
```
252239

253240
### Memory usage limit
@@ -256,19 +243,21 @@ The `bf.bloom-memory-usage-limit` configuration (default 128MB) controls the max
256243

257244
Example usage of increasing the limit:
258245
```bash
259-
CONFIG SET bf.bloom-memory-usage-limit 256mb
246+
127.0.0.1:6379> CONFIG SET bf.bloom-memory-usage-limit 268435456
260247
```
261248

262-
This setting is particularly important for production environments for several reasons:
249+
Having a limit on the max memory per bloom object prevents them from growing unbounded, thus maintaining server performance during serialization.
263250

264-
- **Resource protection**: Prevents a single bloom filter from consuming excessive memory
265-
- **Denial of service prevention**: Protects against attacks that might try to create enormous filters
266-
- **Predictable scaling**: Ensures bloom filters have a known upper bound on resource usage
267-
268-
If your use case requires exceptionally large bloom filters, you can increase this limit. However, be aware that very large bloom filters might impact overall system performance and memory availability for other operations.
251+
If your use case requires bloom filters with capacity beyond what this limit supports, you can increase this configuration value. However, be aware it can impact overall system performance and memory availability for other operations.
269252

270253
When a bloom filter reaches this memory limit, any operation that would cause it to exceed the limit will fail with an error message indicating that the memory limit would be exceeded.
271254

255+
Example of error returned:
256+
```bash
257+
127.0.0.1:6379> bf.add full_filter ne_item
258+
1) (error) ERR operation exceeds bloom object memory limit
259+
```
260+
272261
## Performance
273262

274263
The bloom commands which involve adding items or checking the existence of items have a time complexity of O(N * K) where N is the number of hash functions used by the bloom filter and K is the number of elements being inserted. This means that both BF.ADD and BF.EXISTS are both O(N) as they only operate on one item.
@@ -336,6 +325,13 @@ There are two notable validations bloom filters faces.
336325

337326
The memory usage limit per bloom filter by default is defined by the `BF.BLOOM-MEMORY-USAGE-LIMIT` module configuration which has a default value of 128 MB. If a command results in a creation / scale out causing the overall memory usage to exceed this limit, the command is rejected. This config is modifiable and can be increased as needed.
338327

328+
The `BF.INFO` command's `SIZE` field can be used to find out the current size of a bloom filter.
329+
330+
```bash
331+
172.31.45.25:6379> BF.INFO validate_scale_valid SIZE
332+
(integer) 384
333+
```
334+
339335
2. Number of sub filters (in case of scalable bloom filters):
340336

341337
When a bloom filter scales out, a new sub filter is added. The limit on the number of sub filters depends on the false positive rate and tightening ratio. Each sub filter has a stricter false positive, and this is controlled by the tightening ratio. If a command attempting a scale out results in the sub filter reaching a false positive of 0, the command is rejected.
@@ -360,10 +356,3 @@ The `BF.INFO` command's `MAXSCALEDCAPACITY` field can be used to find out the ma
360356
127.0.0.1:6379> BF.INFO validate_scale_valid MAXSCALEDCAPACITY
361357
(integer) 26214300
362358
```
363-
364-
The `BF.INFO` command's `SIZE` field can be used to find out the current size of a bloom filter.
365-
366-
```bash
367-
172.31.45.25:6379> BF.INFO validate_scale_valid SIZE
368-
(integer) 384
369-
```

0 commit comments

Comments
 (0)