feat(java): Chunk by chunk predictive map serialization protocol (WIP) #1722

Hen1ng · 2024-07-08T12:12:57Z

What does this PR do?

Implement chunk based map serialization in #925. This pr doesn't provide JIT support, it will be implemented in later PR.

Related issues

Does this PR introduce any user-facing change?

Does this PR introduce any public API change?
Does this PR introduce any binary protocol compatibility change?

Benchmark

chaokunyang · 2024-07-31T11:17:08Z

java/fury-core/src/main/java/org/apache/fury/serializer/collection/AbstractMapSerializer.java

-      }
-    }
-  }
+    protected MethodHandle constructor;


Could you format code by mvn spotless:apply? Those changes can then be removed

done!
Spotless version 2.41.1 does not support JDK 8, at least JDK version 11 is required

Yes, spotless needs jdk11 now

chaokunyang · 2024-08-06T13:35:45Z

Could we add license header for MapChunkWriter?

chaokunyang · 2024-08-22T13:49:12Z

Hi @Hen1ng , I noticed we created a MapChunkWriter for serialization/deserialization everytime when we serialize a map. This will introduce extra object allocation cose.

And I found that we use fields in MapChunkWriter for maintain state across different elements serialization. This is good for code readability, but not efficient for execution, we will read and update fields frequently, this will introduce extra indirection and read/write cost. It would be faster if we keep state loca variables. In such ways, all state updates happens on stack, which will be much faster. And we used many small methods, we may need to inline it in the caller to reduce method cost of the method body is small.

chaokunyang · 2024-08-22T13:50:42Z

Another thing I found is that we seems not support predict same type in this PR. For example:

  public static void main(String[] args) {
    Map<String, Integer> map =new HashMap<>(20);
    for (int i = 0; i < 20; i++) {
      map.put("Key"+i, i);
    }
    Fury fury = Fury.builder().withChunkSerializeMapEnable(true).build();
    byte[] result = null;
    for (int i = 0; i < 1000000000; i++) {
      result = fury.serialize(map);
    }
    fury.deserialize(result);
  }

With this PR, we still needs to write key type and value type for every element, could we optimize this in current PR?

Hen1ng · 2024-09-10T10:00:23Z

Benchmark (size) (tracking) Mode Cnt Score Error Units
HnBenchmark.testGeneralChunkWrite 64 true avgt 15 986.457 ± 64.592 ns/op
HnBenchmark.testGeneralChunkWrite 64 false avgt 15 924.420 ± 201.541 ns/op
HnBenchmark.testGeneralChunkWrite 128 true avgt 15 2029.786 ± 95.281 ns/op
HnBenchmark.testGeneralChunkWrite 128 false avgt 15 1992.186 ± 173.750 ns/op
HnBenchmark.testGeneralChunkWrite 256 true avgt 15 4305.362 ± 326.161 ns/op
HnBenchmark.testGeneralChunkWrite 256 false avgt 15 3808.830 ± 120.738 ns/op
HnBenchmark.testGeneralChunkWrite 512 true avgt 15 10309.996 ± 264.954 ns/op
HnBenchmark.testGeneralChunkWrite 512 false avgt 15 9160.058 ± 1075.136 ns/op
HnBenchmark.testGeneralWrite 64 true avgt 15 1384.715 ± 388.240 ns/op
HnBenchmark.testGeneralWrite 64 false avgt 15 1522.562 ± 617.019 ns/op
HnBenchmark.testGeneralWrite 128 true avgt 15 2718.254 ± 666.551 ns/op
HnBenchmark.testGeneralWrite 128 false avgt 15 2916.637 ± 838.792 ns/op
HnBenchmark.testGeneralWrite 256 true avgt 15 6086.414 ± 349.734 ns/op
HnBenchmark.testGeneralWrite 256 false avgt 15 5871.677 ± 672.658 ns/op
HnBenchmark.testGeneralWrite 512 true avgt 15 11769.064 ± 995.204 ns/op
HnBenchmark.testGeneralWrite 512 false avgt 15 12268.517 ± 1140.401 ns/op

generalWrite benchmark compare, benchmark code

@State(Scope.Benchmark)
@BenchmarkMode({Mode.AverageTime})
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 3, time = 5)
@Threads(1)
@Fork(5)
public class HnBenchmark {

  private Fury furyMapChunk;
  private Fury fury;
  Map<Integer, Integer> map;
  MapBean mapBean = new MapBean();
  Map<BeanB, BeanB> beanBBeanBMap = new HashMap<>();
  Bean bean;

  @Param({"64", "128", "256", "512"})
  int size;

    @Param({"true", "false"})
  boolean tracking = false;


  @Setup(Level.Trial)
  public void init() {
    furyMapChunk =
        Fury.builder()
            .withLanguage(Language.JAVA)
            .withRefTracking(tracking)
            .withCodegen(false)
            .withChunkSerializeMapEnable(true)
            .requireClassRegistration(false)
            .build();
    fury =
        Fury.builder()
            .withLanguage(Language.JAVA)
            .withRefTracking(tracking)
            .withCodegen(false)
            .withChunkSerializeMapEnable(false)
            .requireClassRegistration(false)
            .build();
        map = new HashMap<>(size);
    for (int i = 0; i < size; i++) {
      map.put(i, i);
      beanBBeanBMap.put(new BeanB(), new BeanB());
    }
    bean = new Bean();
    bean.setMap(map);
    mapBean.setMap(beanBBeanBMap);
  }

  @Benchmark
  public void testGeneralChunkWrite() {
    final byte[] serialize = furyMapChunk.serialize(map);
  }

  @Benchmark
  public void testGeneralWrite() {
    final byte[] serialize = fury.serialize(map);
  }
}

Hen1ng · 2024-09-10T11:24:25Z

Benchmark (size) (tracking) Mode Cnt Score Error Units
HnBenchmark.testFinalChunkWrite 64 true avgt 15 1513.621 ± 266.821 ns/op
HnBenchmark.testFinalChunkWrite 64 false avgt 15 1548.258 ± 374.848 ns/op
HnBenchmark.testFinalChunkWrite 128 true avgt 15 2587.836 ± 263.128 ns/op
HnBenchmark.testFinalChunkWrite 128 false avgt 15 2411.845 ± 51.029 ns/op
HnBenchmark.testFinalChunkWrite 256 true avgt 15 5246.138 ± 444.736 ns/op
HnBenchmark.testFinalChunkWrite 256 false avgt 15 5413.328 ± 1192.068 ns/op
HnBenchmark.testFinalChunkWrite 512 true avgt 15 12151.494 ± 954.000 ns/op
HnBenchmark.testFinalChunkWrite 512 false avgt 15 11942.548 ± 775.937 ns/op
HnBenchmark.testFinalWrite 64 true avgt 15 1814.819 ± 491.802 ns/op
HnBenchmark.testFinalWrite 64 false avgt 15 1878.988 ± 795.618 ns/op
HnBenchmark.testFinalWrite 128 true avgt 15 3626.284 ± 1315.089 ns/op
HnBenchmark.testFinalWrite 128 false avgt 15 3224.561 ± 402.316 ns/op
HnBenchmark.testFinalWrite 256 true avgt 15 7173.662 ± 671.952 ns/op
HnBenchmark.testFinalWrite 256 false avgt 15 6983.448 ± 663.652 ns/op
HnBenchmark.testFinalWrite 512 true avgt 15 13694.154 ± 940.861 ns/op
HnBenchmark.testFinalWrite 512 false avgt 15 13999.483 ± 1102.000 ns/op

finalWrite benchmark compare, benchmark code is below

@State(Scope.Benchmark)
@BenchmarkMode({Mode.AverageTime})
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 3, time = 5)
@Threads(1)
@Fork(5)
public class HnBenchmark {

  private Fury furyMapChunk;
  private Fury fury;
  Map<Integer, Integer> map;
  MapBean mapBean = new MapBean();
  Map<BeanB, BeanB> beanBBeanBMap = new HashMap<>();
  Bean bean;

  @Param({"64", "128", "256", "512"})
  int size;

    @Param({"true", "false"})
  boolean tracking = false;


  @Setup(Level.Trial)
  public void init() {
    furyMapChunk =
        Fury.builder()
            .withLanguage(Language.JAVA)
            .withRefTracking(tracking)
            .withCodegen(false)
            .withChunkSerializeMapEnable(true)
            .requireClassRegistration(false)
            .build();
    fury =
        Fury.builder()
            .withLanguage(Language.JAVA)
            .withRefTracking(tracking)
            .withCodegen(false)
            .withChunkSerializeMapEnable(false)
            .requireClassRegistration(false)
            .build();
    
    map = new HashMap<>(size);
    for (int i = 0; i < size; i++) {
      map.put(i, i);
    }
    bean = new Bean();
    bean.setMap(map);
    
  }



    @Benchmark
    public void testFinalChunkWrite() {
        final byte[] serialize = furyMapChunk.serialize(bean);
    }

  @Benchmark
  public void testFinalWrite() {
    final byte[] serialize = fury.serialize(bean);
  }
}

Hen1ng · 2024-09-11T01:44:06Z

HnBenchmark.testNoFinalGenericChunkWrite 64 true avgt 15 5173.244 ± 551.374 ns/op
HnBenchmark.testNoFinalGenericChunkWrite 64 false avgt 15 2549.857 ± 213.828 ns/op
HnBenchmark.testNoFinalGenericChunkWrite 128 true avgt 15 6468.278 ± 309.073 ns/op
HnBenchmark.testNoFinalGenericChunkWrite 128 false avgt 15 4765.756 ± 192.533 ns/op
HnBenchmark.testNoFinalGenericChunkWrite 256 true avgt 15 12581.604 ± 460.256 ns/op
HnBenchmark.testNoFinalGenericChunkWrite 256 false avgt 15 10055.729 ± 586.683 ns/op
HnBenchmark.testNoFinalGenericWrite 64 true avgt 15 5140.018 ± 158.204 ns/op
HnBenchmark.testNoFinalGenericWrite 64 false avgt 15 3775.049 ± 190.314 ns/op
HnBenchmark.testNoFinalGenericWrite 128 true avgt 15 9481.691 ± 310.432 ns/op
HnBenchmark.testNoFinalGenericWrite 128 false avgt 15 7771.257 ± 442.158 ns/op
HnBenchmark.testNoFinalGenericWrite 256 true avgt 15 17014.393 ± 995.254 ns/op
HnBenchmark.testNoFinalGenericWrite 256 false avgt 15 15370.666 ± 809.569 ns/op

testNoFinalGenericWrite benchmark compare, benchmark code is below

@State(Scope.Benchmark)
@BenchmarkMode({Mode.AverageTime})
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 3, time = 5)
@Threads(1)
@Fork(5)
public class HnBenchmark {

  private Fury furyMapChunk;
  private Fury fury;
  Map<Integer, Integer> map;
  MapBean mapBean = new MapBean();
  Map<BeanB, BeanB> beanBBeanBMap = new HashMap<>();
  Bean bean;

  @Param({"64", "128", "256"})
  int size;

    @Param({"true", "false"})
  boolean tracking = false;

  @Setup(Level.Trial)
  public void init() {
    furyMapChunk =
        Fury.builder()
            .withLanguage(Language.JAVA)
            .withRefTracking(tracking)
            .withCodegen(false)
            .withChunkSerializeMapEnable(true)
            .requireClassRegistration(false)
            .build();
    fury =
        Fury.builder()
            .withLanguage(Language.JAVA)
            .withRefTracking(tracking)
            .withCodegen(false)
            .withChunkSerializeMapEnable(false)
            .requireClassRegistration(false)
            .build();
    map = new HashMap<>(size);
    for (int i = 0; i < size; i++) {
      map.put(i, i);
      beanBBeanBMap.put(new BeanB(), new BeanB());
    }
    bean = new Bean();
    bean.setMap(map);
    mapBean.setMap(beanBBeanBMap);
  }


  @Benchmark
    public void testNoFinalGenericChunkWrite() {
        final byte[] serialize = furyMapChunk.serialize(mapBean);
    }

  @Benchmark
  public void testNoFinalGenericWrite() {
    final byte[] serialize = fury.serialize(mapBean);
  }
}

Hen1ng · 2024-09-11T02:55:57Z

Benchmark (size) (tracking) Mode Cnt Score Error Units
HnBenchmark.testGeneralChunkWriteWithNull 64 true avgt 15 1228.736 ± 153.330 ns/op
HnBenchmark.testGeneralChunkWriteWithNull 64 false avgt 15 1150.012 ± 83.370 ns/op
HnBenchmark.testGeneralChunkWriteWithNull 128 true avgt 15 2393.119 ± 168.874 ns/op
HnBenchmark.testGeneralChunkWriteWithNull 128 false avgt 15 2607.314 ± 463.271 ns/op
HnBenchmark.testGeneralChunkWriteWithNull 256 true avgt 15 4554.671 ± 91.605 ns/op
HnBenchmark.testGeneralChunkWriteWithNull 256 false avgt 15 4977.650 ± 1019.474 ns/op
HnBenchmark.testGeneralWriteWithNull 64 true avgt 15 1935.727 ± 1269.534 ns/op
HnBenchmark.testGeneralWriteWithNull 64 false avgt 15 1704.501 ± 604.193 ns/op
HnBenchmark.testGeneralWriteWithNull 128 true avgt 15 2741.738 ± 664.690 ns/op
HnBenchmark.testGeneralWriteWithNull 128 false avgt 15 2431.270 ± 65.081 ns/op
HnBenchmark.testGeneralWriteWithNull 256 true avgt 15 5496.098 ± 117.704 ns/op
HnBenchmark.testGeneralWriteWithNull 256 false avgt 15 5845.516 ± 90.525 ns/op

@State(Scope.Benchmark)
@BenchmarkMode({Mode.AverageTime})
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 3, time = 5)
@Threads(1)
@Fork(5)
public class HnBenchmark {

  private Fury furyMapChunk;
  private Fury fury;
  Map<Integer, Integer> map;
  MapBean mapBean = new MapBean();
  Map<BeanB, BeanB> beanBBeanBMap = new HashMap<>();
  Bean bean;

  @Param({"64", "128", "256"})
  int size;

    @Param({"true", "false"})
  boolean tracking = false;


  @Setup(Level.Trial)
  public void init() {
    furyMapChunk =
        Fury.builder()
            .withLanguage(Language.JAVA)
            .withRefTracking(tracking)
            .withCodegen(false)
            .withChunkSerializeMapEnable(true)
            .requireClassRegistration(false)
            .build();
    fury =
        Fury.builder()
            .withLanguage(Language.JAVA)
            .withRefTracking(tracking)
            .withCodegen(false)
            .withChunkSerializeMapEnable(false)
            .requireClassRegistration(false)
            .build();
    map = new HashMap<>(size);
    for (int i = 0; i < size; i++) {
      if ( i == 0) {
        map.put(null, i);
        beanBBeanBMap.put(null, new BeanB());
        continue;
      }
      if (i == 10) {
        map.put(i, null);
        beanBBeanBMap.put(new BeanB(), null);
        continue;
      }
      map.put(i, i);
      beanBBeanBMap.put(new BeanB(), new BeanB());
    }
    bean = new Bean();
    bean.setMap(map);
    mapBean.setMap(beanBBeanBMap);
  }

    @Benchmark
  public void testGeneralChunkWriteWithNull() {
    final byte[] serialize = furyMapChunk.serialize(map);
  }

  @Benchmark
  public void testGeneralWriteWithNull() {
    final byte[] serialize = fury.serialize(map);
  }
}

chaokunyang · 2024-09-11T03:33:38Z

The performance are basically same, it's a little unexpected. Could you share some profiling data here?

Hen1ng · 2024-09-11T11:41:18Z

The performance are basically same, it's a little unexpected. Could you share some profiling data here?

Hen1ng · 2024-09-11T11:44:18Z

generalJavaChunkWrite profiling data

Hen1ng · 2024-09-11T11:45:03Z

current method generalJavaWrite profiling data

chaokunyang · 2024-09-11T12:12:08Z

java/fury-core/src/main/java/org/apache/fury/serializer/collection/AbstractMapSerializer.java

+    }
+  }
+
+  private void javaChunkWriteWithKeySerializers(


this method is too big, jvm jit can't inline such big method, could we split it into multiple smaller methods? especially for less frequent code path?

chaokunyang · 2024-10-27T09:54:50Z

java/fury-core/src/main/java/org/apache/fury/Fury.java

@@ -1380,6 +1389,10 @@ public Class<? extends Serializer> getDefaultJDKStreamSerializerType() {
    return config.getDefaultJDKStreamSerializerType();
  }

+  public boolean isChunkSerializeMapEnabled() {


In future this will be the only way to serialize map, we'd better not make this as an option. We can add an option in AbstractMapSerializer directly, and remove that option when we support chunk based serialization in JIT mode

map general write

8ce5f13

Hen1ng requested a review from chaokunyang as a code owner July 8, 2024 12:12

hening added 11 commits July 10, 2024 10:00

map serializer chunk by chunk, key is not allowed null

6e45eb6

preserveByteForHeaderAndChunkSize and extract method

7fd2dc6

checkChunkSize and modify max chunk size to 127

9242f76

use Preconditions check chunk size

7fbe54d

save code

07d7911

update

1e03081

update

b382f1d

debug

ca2d221

remove useless commit

82cb1fb

delete useless code

c492e8e

添加配置isChunkSerializeMapEnabled，是否开启mapchunk序列化

77c6364

chaokunyang reviewed Jul 31, 2024

View reviewed changes

mvn spotless apply

fe2bb76

chaokunyang changed the title ~~feat(java):Chunk by chunk predictive map serialization protocol (WIP)~~ feat(java): Chunk by chunk predictive map serialization protocol (WIP) Jul 31, 2024

hening added 2 commits August 2, 2024 11:07

reset key and value serializer is null

55f8440

modify code

3752339

add apache license

48b8a75

use local variable and bug fix

c06fdc7

use local variable and big method

aa364f5

Hen1ng closed this Sep 11, 2024

Hen1ng reopened this Sep 11, 2024

chaokunyang reviewed Sep 11, 2024

View reviewed changes

chaokunyang reviewed Oct 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(java): Chunk by chunk predictive map serialization protocol (WIP) #1722

feat(java): Chunk by chunk predictive map serialization protocol (WIP) #1722

Hen1ng commented Jul 8, 2024 •

edited by chaokunyang

Loading

chaokunyang Jul 31, 2024

Hen1ng Jul 31, 2024

chaokunyang Jul 31, 2024

chaokunyang commented Aug 6, 2024

chaokunyang commented Aug 22, 2024

chaokunyang commented Aug 22, 2024

Hen1ng commented Sep 10, 2024 •

edited

Loading

Hen1ng commented Sep 10, 2024

Hen1ng commented Sep 11, 2024 •

edited

Loading

Hen1ng commented Sep 11, 2024

chaokunyang commented Sep 11, 2024

Hen1ng commented Sep 11, 2024

Hen1ng commented Sep 11, 2024

Hen1ng commented Sep 11, 2024

chaokunyang Sep 11, 2024 •

edited

Loading

chaokunyang Oct 27, 2024

feat(java): Chunk by chunk predictive map serialization protocol (WIP) #1722

Are you sure you want to change the base?

feat(java): Chunk by chunk predictive map serialization protocol (WIP) #1722

Conversation

Hen1ng commented Jul 8, 2024 • edited by chaokunyang Loading

What does this PR do?

Related issues

Does this PR introduce any user-facing change?

Benchmark

chaokunyang Jul 31, 2024

Choose a reason for hiding this comment

Hen1ng Jul 31, 2024

Choose a reason for hiding this comment

chaokunyang Jul 31, 2024

Choose a reason for hiding this comment

chaokunyang commented Aug 6, 2024

chaokunyang commented Aug 22, 2024

chaokunyang commented Aug 22, 2024

Hen1ng commented Sep 10, 2024 • edited Loading

Hen1ng commented Sep 10, 2024

Hen1ng commented Sep 11, 2024 • edited Loading

Hen1ng commented Sep 11, 2024

chaokunyang commented Sep 11, 2024

Hen1ng commented Sep 11, 2024

Hen1ng commented Sep 11, 2024

Hen1ng commented Sep 11, 2024

chaokunyang Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

chaokunyang Oct 27, 2024

Choose a reason for hiding this comment

Hen1ng commented Jul 8, 2024 •

edited by chaokunyang

Loading

Hen1ng commented Sep 10, 2024 •

edited

Loading

Hen1ng commented Sep 11, 2024 •

edited

Loading

chaokunyang Sep 11, 2024 •

edited

Loading