RFC: JuliaFolds interfaces #43

tkf · 2021-05-14T10:48:29Z

This patch implements two JuliaFolds interfaces on ChainedVector: the sequential iteration protocol (aka foldl) using FGenerators.jl syntax and SplittablesBase.jl interface for parallel reductions.

Arguably, the dependency tree pulled via FGenerators.jl, especially Transducers.jl, is rather large. I'm not sure if you want to pull this in at this stage (i.e., probably you'd want to wait until I extract it out as FoldsBase.jl). But I thought it'd be interesting to demonstrate that using JuliaFolds' iteration facility can be beneficial for not only parallel reduction but also for sequential iterations. For example, maybe this can make some parts of the optimization like #42 easier.

This patch uses FGenerators.jl which is a syntax sugar of Transducers.__foldl__. This is mainly because writing __foldl__ is slightly tedious and also I may need to tweak the interface for solving some subtle problems in parallel reduction at some point. But I expect the syntax sugar provided by FGenerators.jl to be more stable.

Microbenchmark

A simple summation of ChainedVector{Int} is 4x faster with @floop that uses foldl as the iteration mechanism. Looking into LLVM, @floop version is vectorized but iterate version is not.

julia> using FLoops

julia> function sum_iter(xs)
           acc = zero(eltype(xs))
           for x in xs
               acc += x
           end
           acc
       end
sum_iter (generic function with 1 method)

julia> function sum_foldl(xs)
           @floop begin
               acc = zero(eltype(xs))
               for x in xs
                   acc += x
               end
           end
           acc
       end
sum_foldl (generic function with 1 method)

julia> A = ChainedVector([ones(Int, 2^8) for _ in 1:2^8]);

julia> @btime sum_iter(A)
  43.279 μs (1 allocation: 16 bytes)
65536

julia> @btime sum_foldl(A)
  9.500 μs (1 allocation: 16 bytes)
65536

Note: I'm using Int as the element type so that vectorization can be triggered easily. Supporting @simd for floats is possible but ATM it requires a rather ugly macro.

I think it's a big win, also considering that the @yield-based syntax is much simpler than the complex iterate implementation:

SentinelArrays.jl/src/folds.jl

Lines 4 to 10 in ec17e62

    
           @fgenerator(A::ChainedVector) do 
        
               for array in A.arrays 
        
                   for x in array 
        
                       @yield x 
        
                   end 
        
               end 
        
           end

quinnj · 2021-05-25T04:38:16Z

Woohoo! This is awesome! It's indeed quite painful to eak out as much performance as possible using standard iteration protocols from Base. So in #42, I basically have to overload every custom array operation from Base to avoid iteration and sequential indexing.

I am worried about the current dependency tree here; this package has become a "foundational" package of the data ecosystem, so it's a hard to allow adding such heavy dependencies. I love the idea of a FoldBase.jl though that would allow a lightweight "hook" into all the folds/transducers magic.

RFC: Support JuliaFolds interfaces

ec17e62

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: JuliaFolds interfaces #43

RFC: JuliaFolds interfaces #43

tkf commented May 14, 2021 •

edited

Loading

quinnj commented May 25, 2021

	@fgenerator(A::ChainedVector) do
	for array in A.arrays
	for x in array
	@yield x
	end
	end
	end

RFC: JuliaFolds interfaces #43

Are you sure you want to change the base?

RFC: JuliaFolds interfaces #43

Conversation

tkf commented May 14, 2021 • edited Loading

Microbenchmark

quinnj commented May 25, 2021

tkf commented May 14, 2021 •

edited

Loading