Skip to content

Commit

Permalink
Add chunking of binary dump
Browse files Browse the repository at this point in the history
Signed-off-by: Jacob Murphy <[email protected]>
  • Loading branch information
murphyjacob4 committed Jan 28, 2025
1 parent 89e136a commit 81cf8a7
Showing 1 changed file with 128 additions and 49 deletions.
177 changes: 128 additions & 49 deletions rfc/rdb-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,45 +34,47 @@ Our existing RDB format is a good start, but it also is fairly rigid, not suppor
Below is a diagram of the proposed payload design:

```
Example Header
┌──────────────────────────────────┐
Unknown types │ Type Enc. Header │
are skipped │ (enum) Version Content │
│ │┌───────┐┌───────┐┌──────────────┐│
│ ││ Index ││ ││E.g, attribute││ Example Supplemental Content
└──────────►│Content││ 1 ││name... ││ ┌───────────────────────────────────────────────┐
│└───────┘└───────┘└──────────────┘│ │ Header Binary Header Binary │
└────────────┬─────────────────────┘ │ Proto 1 Dump 1 Proto 2 Dump 2 │
│ │┌────────┐┌───────────┐┌────────┐┌───────────┐ │
└────────────────────────►│ ││ ... ││ ││ ... │ │
│└────────┘└───────────┘└────────┘└───────────┘ │
└───────────────────────────────────────────────┘
RDB File │
┌──────────────────────────────────────────────────────┼────────────────────────────────────────────────────────┐
│ Module Type OpCode When Private Module Data │ │
│┌───────────┐ ┌────┐ ┌────┐ ┌─────────────────────────┼───────────────────────────────────────────────────────┐│
││ │ │ │ │ │ │Section RDBSection │ Supplemental RDBSection Supplemental ││
││ │ │ │ │ │ │ Count Proto Payload 1 │ Content for #1 Proto Payload 2 Content for #2 ││
││"SchMgr-VS"│ │ 2 │ │ 2 │ │┌─────┐┌───────────────┐┌▼─────────────────┐┌───────────────┐┌──────────────────┐││
││ │ │ │ │ │ ││ 2 ││ ││ ││ ││ │││
││ │ │ │ │ │ │└─────┘└──────▲────────┘└──────────────────┘└───────────────┘└──────────────────┘││
│└───────────┘ └────┘ └────┘ └──────────────┼──────────────────────────────────────────────────────────────────┘│
└───────────────────────────────────────────┼───────────────────────────────────────────────────────────────────┘
Example RDBSection
┌───────────────────────────────────────────────────┐
│ Type Supplemental │
│ (enum) Count │
│ ┌─────────┐┌────────────────────────┐┌──────────┐ │
┌─┼─► Schema ││ <schema contents> ││ 2 │ │
│ │ └─────────┘└────────────────────────┘└──────────┘ │
│ └───────────────────────────────────────────────────┘
Unknown types
are skipped
Example Binary Dump
┌─────────────────────────────────────┐
Example Header │ Chunk 1 Chunk 2 Chunk 3 EOF │
┌───────────────────────────────────────────┐ │┌───────┐┌───────┐┌───────┐┌────────┐│
Unknown types│ Type Required Enc. Header │ ││... ││... ││... ││ ││
are skipped │ (enum) Version Content │ ┌──────────│└───────┘└───────┘└───────┘└────────┘│
│ │ ┌───────┐┌──────┐┌───────┐┌──────────────┐│ │ └─────────────────────────────────────┘
│ │ │ Index ││ ││ ││E.g, attribute││ │
└───────┼►│Content││ True ││ 1 ││name... ││ ┌──────────────────┼────────────────────────────┐
│ └───────┘└──────┘└───────┘└──────────────┘│ │ Header Binary │ Header Binary │
└─────────────────────┬─────────────────────┘ │ Proto 1 Dump 1 ▼ Proto 2 Dump 2 │
│ │┌────────┐┌───────────┐┌────────┐┌───────────┐ │
└────────────────────────►│ ││ ... ││ ││ ... │ │
│└────────┘└───────────┘└────────┘└───────────┘ │
└───────────────────────────────────────────────┘
Example Supplemental Content
RDB Aux Section │
┌───────────────────────────────────────────────────┼────────────────────────────────────────────────────────┐
│ Module Type OpCode When Private Module Data │ │
│┌───────────┐┌────┐┌────┐┌─────────────────────────┼───────────────────────────────────────────────────────┐│
││ ││ ││ ││Section VSRDBSection │ Supplemental VSRDBSection Supplemental ││
││ ││ ││ ││ Count Proto Payload 1 │ Content for #1 Proto Payload 2 Content for #2 ││
││"SchMgr-VS"││ 2 ││ 2 ││┌─────┐┌───────────────┐┌▼─────────────────┐┌───────────────┐┌──────────────────┐││
││ ││ ││ │││ 2 ││ ││ ││ ││ │││
││ ││ ││ ││└─────┘└──────▲────────┘└──────────────────┘└───────────────┘└──────────────────┘││
│└───────────┘└────┘└────┘└──────────────┼──────────────────────────────────────────────────────────────────┘│
└────────────────────────────────────────┼───────────────────────────────────────────────────────────────────┘
Example VSRDBSection
┌───────────────────────────────────────────────────┐
│ Type Required Enc. Supplemental │
│ (enum) Version Count │
│ ┌────────┐┌──────┐┌──────┐┌─────────┐┌──────────┐ │
┌─┼─► Schema ││ True ││ 1 ││<content>││ 2 │ │
│ │ └────────┘└──────┘└──────┘└─────────┘└──────────┘ │
│ └───────────────────────────────────────────────────┘
Unknown types
are skipped
```

#### RDBSection
Expand All @@ -81,7 +83,7 @@ The primary unit of the RDB Payload is the RDBSection, which will have a proto d

```
enum RDBSectionType {
RDB_SECTION_INDEX_SCHEMA;
RDB_SECTION_INDEX_SCHEMA,
...
}
Expand Down Expand Up @@ -133,6 +135,20 @@ The supplemental header will allow differing versions of the module to identify

When loading supplemental content, the content will be ignored if the type is unknown, or the encoding version is higher than we understand, and `required` is not true. If `required` is true, we will have to return an error if we can't understand the contents.

#### Binary Dump

With the current Valkey RDB APIs, modules only have the ability to perform a complete read or write of a certain type to the RDB, there is no streaming capabilities. If the module were to attempt to write a gigabyte of data, it requires the full gigabyte to be serialized in memory, then passed the RDB APIs to save into the RDB.

To prevent memory overhead for large binary dumps, we will implement chunking of binary data to reduce the size of the individual RDB write API calls. We will use a simple protocol buffer with the following format to represent a chunk in a binary dump:

```
message SupplementalContentChunk {
bytes binary_content = 1;
}
```

To support previous version's ability to skip binary contents contained in supplemental content sections, the end of a binary dump is marked by a single SupplementalContentChunk that has no data. This will signal EOF, and the loading procedure will know that the next item is either the next SupplementalContentHeader, or the next RDBSection if no more SupplementalContentHeaders exist for the current RDBSection.

#### Example: Adding Vector Quantization

With the above design, suppose that we are substantially changing the index to support a vector quantization option on `FT.CREATE`. For simplicity, suppose this is just a boolean "on" or "off" flag.
Expand All @@ -142,10 +158,10 @@ On the old version, in the RDB, we would output something like the following:
```
RDBSection {
type: RDB_SECTION_INDEX_SCHEMA,
required: true,
encoding_version: 1,
index_schema_contents: {
name: "my_index",
required: true,
encoding_version: 1,
attributes: [
{
identifier: "my_vector",
Expand Down Expand Up @@ -177,8 +193,20 @@ SupplementalContentHeader {
type: SUPPLEMENTAL_KEY_TO_ID,
required: true,
enc_version: 1,
key_to_id_header: {
attribute_name: "my_vector"
}
}
SupplementalContentChunk {
contents: <key_to_id_dump_1>
}
SupplementalContentChunk {
contents: <key_to_id_dump_2>
}
...
SupplementalContentChunk {
contents: ""
}
<key_to_id_mapping_dump>
SupplementalContentHeader {
type: SUPPLEMENTAL_INDEX_CONTENTS,
required: true,
Expand All @@ -187,7 +215,16 @@ SupplementalContentHeader {
attribute_name: "my_vector",
}
}
<my_vector_index_contents>
SupplementalContentChunk {
contents: <my_vector_contents_1>
}
SupplementalContentChunk {
contents: <my_vector_contents_2>
}
...
SupplementalContentChunk {
contents: ""
}
```

Suppose that the new version introduces a new field in VectoIndex - `bool quantize`. Protocol buffers initialize the default values to a "zero-like" value, so this will be `false` if not previously set. We could also add it as `optional bool quantize`, and specifically check if the VectorIndex proto has the `quantize` field set explicitly. On the upgrade path - we will default initialize the value of `quantize` to false (or handle the default case as we see fit, if we use `optional`).
Expand Down Expand Up @@ -233,8 +270,20 @@ SupplementalContentHeader {
type: SUPPLEMENTAL_KEY_TO_ID,
required: true,
enc_version: 1,
key_to_id_header: {
attribute_name: "my_vector"
}
}
SupplementalContentChunk {
contents: <key_to_id_dump_1>
}
SupplementalContentChunk {
contents: <key_to_id_dump_2>
}
...
SupplementalContentChunk {
contents: ""
}
<key_to_id_mapping_dump>
SupplementalContentHeader {
type: SUPPLEMENTAL_INDEX_CONTENTS,
required: true,
Expand All @@ -243,10 +292,19 @@ SupplementalContentHeader {
attribute_name: "my_vector",
}
}
<my_quantized_vector_index_contents>
SupplementalContentChunk {
contents: <my_vector_contents_1>
}
SupplementalContentChunk {
contents: <my_vector_contents_2>
}
...
SupplementalContentChunk {
contents: ""
}
```

On the new version, when the new feature `quantize` is used, we will bump the encoding version of the RDBSection containing the index schema definition (it now contains the `quantize` field, which will be lost on downgrade). Similarly, we will also bump the encoding version of the SupplementalContentHeader for the index contents - as the format has changed in a way that will not be understood by older version. On loading this on the previous version, we will fail fast with a useful error message:
On the new version, when the new feature `quantize` is used, we will bump the encoding version of the RDBSection containing the index schema definition (it now contains the `quantize` field, which will be lost on downgrade). Similarly, we will also bump the encoding version of the SupplementalContentHeader for the index contents - as the format has changed in a way that will not be understood by older versions. On loading this on the previous version, we will fail fast with a useful error message:

```
ValkeySearch RDB contents contain defintions for RDB sections that are not supported by this version. If you are downgrading, ensure all feature usage on the new version of ValkeySearch is supported by this version and retry.
Expand Down Expand Up @@ -293,8 +351,20 @@ SupplementalContentHeader {
type: SUPPLEMENTAL_KEY_TO_ID,
required: true,
enc_version: 1,
key_to_id_header: {
attribute_name: "my_vector"
}
}
SupplementalContentChunk {
contents: <key_to_id_dump_1>
}
SupplementalContentChunk {
contents: <key_to_id_dump_2>
}
...
SupplementalContentChunk {
contents: ""
}
<key_to_id_mapping_dump>
SupplementalContentHeader {
type: SUPPLEMENTAL_INDEX_CONTENTS,
required: true,
Expand All @@ -303,7 +373,16 @@ SupplementalContentHeader {
attribute_name: "my_vector",
}
}
<my_vector_index_contents>
SupplementalContentChunk {
contents: <my_vector_contents_1>
}
SupplementalContentChunk {
contents: <my_vector_contents_2>
}
...
SupplementalContentChunk {
contents: ""
}
```

Upon retry, the RDB load will succeed.
Expand Down

3 comments on commit 81cf8a7

@allenss-amazon
Copy link
Member

@allenss-amazon allenss-amazon commented on 81cf8a7 Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally figured out why this is so confusing to me. this describes all of the activities as saving/restoring protobufs. But Valkey doesn't know anything about protobufs and certainly doesn't know how to save/restore them. It may seem obvious, but to Valkey developers, RDB I/O has a set of functions that operate on a totally different set of datatypes, i.e., ValkeyModule_SaveString, ValkeyModule_SaveUnsigned, etc. not protobufs.

I think the description needs to written in the language of Valkey RDB I/O functions.

Also, I don't think it makes any sense for individual chunks of a supplemental section to be required to be encoded as protobufs. the entire point of the chunking is that you can't afford to store the uber-object as a protobuf, so the chunk is only a piece of what you're storing. Why force it to be a protobuf when we know that we're going to discard the protobuf-ness immediately? Totally unnecessary run-time overhead? Not to mention extra coding time, etc.

I can't quite tell if this proposal supports multiple supplemental data sections per RDB section? If not, I think that's a serious problem.

@murphyjacob4
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this describes all of the activities as saving/restoring protobufs. But Valkey doesn't know anything about protobufs and certainly doesn't know how to save/restore them. It may seem obvious, but to Valkey developers, RDB I/O has a set of functions that operate on a totally different set of datatypes, i.e., ValkeyModule_SaveString, ValkeyModule_SaveUnsigned, etc. not protobufs.

Thanks for figuring out the disconnect. Let me see if I can add some clarity by closing the gap between the two. TL;DR - each protobuf is serialized as a single ValkeyModule_SaveString call

Why force it to be a protobuf when we know that we're going to discard the protobuf-ness immediately?

We need some EOF marker for the payload. Originally, I considered VM_SaveString("") as the EOF, but I was worried about accidentally emitting the EOF when saving. E.g. imagine you are serializing a key-value-pair dict and one of the values is an empty string, you might just iterate over the key-value pairs and VM_SaveString(key); VM_SaveString(val);. Doing VM_SaveString(val) could accidentally match the EOF marker in the case where val=="".

With protobuf, I thought it was a creative way to be able to signal EOF by having a protobuf with an empty content at the end. "not present" is an explicit state different from "empty". Overhead per chunk would be <10 bytes (a byte for the field tag, a variable number of bytes for the length of the chunk). Alternatively if you have an idea for a better EOF marker, let me know and I'm happy to update it.

The goal with EOF is to not force precomputing the number of chunks ahead of time, which will simplify the saving logic. Otherwise, we need to know how much we are saving ahead of time to emit a length, which may be difficult for complicated data types.

@allenss-amazon
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree there are times when not having to pre-compute the number of things makes the code easier. For those cases, an EOF marker isn't a bad way to handle it. Conversely there are many cases where the pre-computation is trivial and having that value in hand greatly simplifies the code (like handling a vector -- a large vector probably REQUIRES you to pre-compute so that we're not encountering the dynamic vector-expansion penalty on large data structures)

I feel like we ought to be able to handle both cases without much complexity. I don't see why some clever encoding in the header can't solve both cases.

Suppose we have three kinds of headers.

  1. I'm an array of blobs with the length "up front".
  2. I'm a blob that might be part of post-compute array of blobs
  3. I'm an end-marker.

We could easily use a single signed number for an opcode. Positive numbers are type 1, Negative numbers are type 2, zero is type three.

For easy-precomputation you use 1 which always has a length value as the next item followed by that many blobs.

For hard pre-computation you just keep dumping 2's into the output format followed by a 3. It would be illegal for the opcode of the 2's to change midstream.

Please sign in to comment.