@@ -29,13 +29,15 @@ discuss aspects of the wire format.
29
29
The Protoscope tool can also dump encoded protocol buffers as text. See
30
30
https://github.com/protocolbuffers/protoscope/tree/main/testdata for examples.
31
31
32
+ All examples in this topic assume that you are using Edition 2023 or later.
33
+
32
34
## A Simple Message {#simple}
33
35
34
36
Let's say you have the following very simple message definition:
35
37
36
38
``` proto
37
39
message Test1 {
38
- optional int32 a = 1;
40
+ int32 a = 1;
39
41
}
40
42
```
41
43
@@ -241,7 +243,7 @@ Consider this message schema:
241
243
242
244
``` proto
243
245
message Test2 {
244
- optional string b = 2;
246
+ string b = 2;
245
247
}
246
248
```
247
249
@@ -275,7 +277,7 @@ an embedded message of our original example message, `Test1`:
275
277
276
278
``` proto
277
279
message Test3 {
278
- optional Test1 c = 3;
280
+ Test1 c = 3;
279
281
}
280
282
```
281
283
@@ -293,36 +295,49 @@ and a length of 3, exactly the same way as strings are encoded.
293
295
In Protoscope, submessages are quite succinct. ` ``1a03089601`` ` can be written
294
296
as ` 3: {1: 150} ` .
295
297
296
- ## Optional and Repeated Elements {#optional}
298
+ ## Missing Elements {#optional}
297
299
298
- Missing ` optional ` fields are easy to encode: we just leave out the record if
300
+ Missing fields are easy to encode: we just leave out the record if
299
301
it's not present. This means that "huge" protos with only a few fields set are
300
302
quite sparse.
301
303
302
- ` repeated ` fields are a bit more complicated. Ordinary (not [ packed] ( #packed ) )
303
- repeated fields emit one record for every element of the field. Thus, if we have
304
+ <span id =" packed " ></span >
305
+
306
+ ## Repeated Elements {#repeated}
307
+
308
+ Starting in Edition 2023, ` repeated ` fields of a primitive type
309
+ (any [ scalar type] ( /programming-guides/proto2#scalar )
310
+ that is not ` string ` or ` bytes ` ) are [ "packed"] ( /editions/features#repeated_field_encoding ) by default.
311
+
312
+ Packed ` repeated ` fields, instead of being encoded as one
313
+ record per entry, are encoded as a single ` LEN ` record that contains each
314
+ element concatenated. To decode, elements are decoded from the ` LEN ` record one
315
+ by one until the payload is exhausted. The start of the next element is
316
+ determined by the length of the previous, which itself depends on the type of
317
+ the field. Thus, if we have:
304
318
305
319
``` proto
306
320
message Test4 {
307
- optional string d = 4;
308
- repeated int32 e = 5 ;
321
+ string d = 4;
322
+ repeated int32 e = 6 ;
309
323
}
310
324
```
311
325
312
326
and we construct a ` Test4 ` message with ` d ` set to ` "hello" ` , and ` e ` set to
313
- ` 1 ` , ` 2 ` , and ` 3 ` , this * could* be encoded as `` ` 220568656c6c6f280128022803 `
314
- ``, or written out as Protoscope,
327
+ ` 1 ` , ` 2 ` , and ` 3 ` , this * could* be encoded as `` `3206038e029ea705` `` , or
328
+ written out as Protoscope,
315
329
316
330
``` proto
317
331
4: {"hello"}
318
- 5: 1
319
- 5: 2
320
- 5: 3
332
+ 6: {3 270 86942}
321
333
```
322
334
323
- However, records for ` e ` do not need to appear consecutively, and can be
324
- interleaved with other fields; only the order of records for the same field with
325
- respect to each other is preserved. Thus, this could also have been encoded as
335
+ However, if the repeated field is set to expanded (overriding the default packed
336
+ state) or is not packable (strings and messages) then an entry for each
337
+ individual value is encoded. Also, records for ` e ` do not need to appear
338
+ consecutively, and can be interleaved with other fields; only the order of
339
+ records for the same field with respect to each other is preserved. Thus, this
340
+ could look like the following:
326
341
327
342
``` proto
328
343
5: 1
@@ -331,6 +346,24 @@ respect to each other is preserved. Thus, this could also have been encoded as
331
346
5: 3
332
347
```
333
348
349
+ Only repeated fields of primitive numeric types can be declared "packed". These
350
+ are types that would normally use the ` VARINT ` , ` I32 ` , or ` I64 ` wire types.
351
+
352
+ Note that although there's usually no reason to encode more than one key-value
353
+ pair for a packed repeated field, parsers must be prepared to accept multiple
354
+ key-value pairs. In this case, the payloads should be concatenated. Each pair
355
+ must contain a whole number of elements. The following is a valid encoding of
356
+ the same message above that parsers must accept:
357
+
358
+ ``` proto
359
+ 6: {3 270}
360
+ 6: {86942}
361
+ ```
362
+
363
+ Protocol buffer parsers must be able to parse repeated fields that were compiled
364
+ as ` packed ` as if they were not packed, and vice versa. This permits adding
365
+ ` [packed=true] ` to existing fields in a forward- and backward-compatible way.
366
+
334
367
### Oneofs {#oneofs}
335
368
336
369
[ ` Oneof ` fields] ( /programming-guides/proto2#oneof ) are
@@ -368,53 +401,6 @@ message.MergeFrom(message2);
368
401
This property is occasionally useful, as it allows you to merge two messages (by
369
402
concatenation) even if you do not know their types.
370
403
371
- ### Packed Repeated Fields {#packed}
372
-
373
- Starting in v2.1.0, ` repeated ` fields of a primitive type
374
- (any [ scalar type] ( /programming-guides/proto2#scalar )
375
- that is not ` string ` or ` bytes ` ) can be declared as "packed". In proto2 this is
376
- done using the field option ` [packed=true] ` . In proto3 it is the default.
377
-
378
- Instead of being encoded as one record per entry, they are encoded as a single
379
- ` LEN ` record that contains each element concatenated. To decode, elements are
380
- decoded from the ` LEN ` record one by one until the payload is exhausted. The
381
- start of the next element is determined by the length of the previous, which
382
- itself depends on the type of the field.
383
-
384
- For example, imagine you have the message type:
385
-
386
- ``` proto
387
- message Test5 {
388
- repeated int32 f = 6 [packed=true];
389
- }
390
- ```
391
-
392
- Now let's say you construct a ` Test5 ` , providing the values 3, 270, and 86942
393
- for the repeated field ` f ` . Encoded, this gives us `` `3206038e029ea705` `` , or
394
- as Protoscope text,
395
-
396
- ``` proto
397
- 6: {3 270 86942}
398
- ```
399
-
400
- Only repeated fields of primitive numeric types can be declared "packed". These
401
- are types that would normally use the ` VARINT ` , ` I32 ` , or ` I64 ` wire types.
402
-
403
- Note that although there's usually no reason to encode more than one key-value
404
- pair for a packed repeated field, parsers must be prepared to accept multiple
405
- key-value pairs. In this case, the payloads should be concatenated. Each pair
406
- must contain a whole number of elements. The following is a valid encoding of
407
- the same message above that parsers must accept:
408
-
409
- ``` proto
410
- 6: {3 270}
411
- 6: {86942}
412
- ```
413
-
414
- Protocol buffer parsers must be able to parse repeated fields that were compiled
415
- as ` packed ` as if they were not packed, and vice versa. This permits adding
416
- ` [packed=true] ` to existing fields in a forward- and backward-compatible way.
417
-
418
404
### Maps {#maps}
419
405
420
406
Map fields are just a shorthand for a special kind of repeated field. If we have
@@ -430,8 +416,8 @@ this is actually the same as
430
416
``` proto
431
417
message Test6 {
432
418
message g_Entry {
433
- optional string key = 1;
434
- optional int32 value = 2;
419
+ string key = 1;
420
+ int32 value = 2;
435
421
}
436
422
repeated g_Entry g = 7;
437
423
}
0 commit comments