Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: JuliaFolds interfaces #43

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

RFC: JuliaFolds interfaces #43

wants to merge 1 commit into from

Conversation

tkf
Copy link

@tkf tkf commented May 14, 2021

This patch implements two JuliaFolds interfaces on ChainedVector: the sequential iteration protocol (aka foldl) using FGenerators.jl syntax and SplittablesBase.jl interface for parallel reductions.

Arguably, the dependency tree pulled via FGenerators.jl, especially Transducers.jl, is rather large. I'm not sure if you want to pull this in at this stage (i.e., probably you'd want to wait until I extract it out as FoldsBase.jl). But I thought it'd be interesting to demonstrate that using JuliaFolds' iteration facility can be beneficial for not only parallel reduction but also for sequential iterations. For example, maybe this can make some parts of the optimization like #42 easier.

This patch uses FGenerators.jl which is a syntax sugar of Transducers.__foldl__. This is mainly because writing __foldl__ is slightly tedious and also I may need to tweak the interface for solving some subtle problems in parallel reduction at some point. But I expect the syntax sugar provided by FGenerators.jl to be more stable.

Microbenchmark

A simple summation of ChainedVector{Int} is 4x faster with @floop that uses foldl as the iteration mechanism. Looking into LLVM, @floop version is vectorized but iterate version is not.

julia> using FLoops

julia> function sum_iter(xs)
           acc = zero(eltype(xs))
           for x in xs
               acc += x
           end
           acc
       end
sum_iter (generic function with 1 method)

julia> function sum_foldl(xs)
           @floop begin
               acc = zero(eltype(xs))
               for x in xs
                   acc += x
               end
           end
           acc
       end
sum_foldl (generic function with 1 method)

julia> A = ChainedVector([ones(Int, 2^8) for _ in 1:2^8]);

julia> @btime sum_iter(A)
  43.279 μs (1 allocation: 16 bytes)
65536

julia> @btime sum_foldl(A)
  9.500 μs (1 allocation: 16 bytes)
65536

Note: I'm using Int as the element type so that vectorization can be triggered easily. Supporting @simd for floats is possible but ATM it requires a rather ugly macro.

I think it's a big win, also considering that the @yield-based syntax is much simpler than the complex iterate implementation:

@fgenerator(A::ChainedVector) do
for array in A.arrays
for x in array
@yield x
end
end
end

@quinnj
Copy link
Member

quinnj commented May 25, 2021

Woohoo! This is awesome! It's indeed quite painful to eak out as much performance as possible using standard iteration protocols from Base. So in #42, I basically have to overload every custom array operation from Base to avoid iteration and sequential indexing.

I am worried about the current dependency tree here; this package has become a "foundational" package of the data ecosystem, so it's a hard to allow adding such heavy dependencies. I love the idea of a FoldBase.jl though that would allow a lightweight "hook" into all the folds/transducers magic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants