Skip to content

refactor array filtering#212

Merged
bodewig merged 6 commits intoapache:masterfrom
esaulpaugh:master
Jun 5, 2025
Merged

refactor array filtering#212
bodewig merged 6 commits intoapache:masterfrom
esaulpaugh:master

Conversation

@esaulpaugh
Copy link
Contributor

Prefer array truncation via <T> java.util.Arrays.copyOf(T[],int) to ArrayList::toArray because it is faster.

@esaulpaugh esaulpaugh changed the title improve DirectoryScanner filtering performance improve filtering performance Sep 5, 2024
@bodewig
Copy link
Member

bodewig commented May 31, 2025

Thank you. Have you faced a performance problem that has been eased by this pull request or is it based on "this arraylist could and should be avoided"?

I'm not opposed to merging the change but wouldn't label it as performance improvement if the difference is likely unnoticeable for most projects.

@esaulpaugh
Copy link
Contributor Author

I have not faced a performance issue but I discovered this trick while optimizing a String parser in one of my projects. It performs better in microbenchmarks compared to building an ArrayList. I suggested it to gson as well: google/gson#2734

On corretto-11.0.27 aarch64:

Benchmark     Mode  Cnt   Score    Error  Units
Measure.arr   avgt   14  17.527 ±  0.435  ns/op
Measure.list  avgt   14  44.013 ± 27.030  ns/op

On graalvm-jdk-24+36.1 (24.2.0) aarch64:

Benchmark     Mode  Cnt   Score   Error  Units
Measure.arr   avgt   14   9.795 ± 0.173  ns/op
Measure.list  avgt   14  41.321 ± 3.544  ns/op
@State(Scope.Benchmark)
@Fork(2)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 7, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 7, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class Measure {

  private static final String[] STRINGS = { null, "", "00", "\0", "...", null, "H", "ab", "+", "UUU", null, null, "5\t5" };
  
  @Benchmark
  public void list(Blackhole blackhole) {
      ArrayList<String> list = new ArrayList<>(/* STRINGS.length */);
      for (String s : STRINGS) {
          if (s != null) {
              list.add(s);
          }
      }
      blackhole.consume(list.toArray(new String[0]));
  }
  
  @Benchmark
  public void arr(Blackhole blackhole) {
      String[] arr = new String[STRINGS.length];
      int i = 0;
      for (String s : STRINGS) {
          if (s != null) {
              arr[i++] = s;
          }
      }
      blackhole.consume(Arrays.copyOf(arr, i));
  }
}

It's unlikely that this change alone would make a noticeable difference to users, but small changes can add up eventually, in my experience.

@bodewig
Copy link
Member

bodewig commented Jun 1, 2025

as I said I'm not opposed to merging it, it will mostly be "a refactoring".

I wondered whether an extra check whether we've used all slots of the temporary array and avoiding copyOf alltogether in some cases may be an additional improvement. In very many cases DirectoryScanner will not hit any symlinks during the scan at all. Then again the performance of DirectoryScanner very likely is dominated by filesystem I/O and there really is not much of a difference anyway. The extra-check would make the code more difficult to read for little real gain.

@esaulpaugh esaulpaugh changed the title improve filtering performance refactor array filtering Jun 2, 2025
@bodewig
Copy link
Member

bodewig commented Jun 4, 2025

Thank you @esaulpaugh . If you want to be credited in our contributors file, please add yourself to CONTRIBUTORS and contributors.xml.

@bodewig bodewig merged commit b3f64f0 into apache:master Jun 5, 2025
@bodewig
Copy link
Member

bodewig commented Jun 5, 2025

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants