You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In FileTrees you start with a FileTree where some nodes in the tree are Thunks and we'd like to compute them in parallel. Current way is to collect them in an array and splat them, but this is not so good for performance as the tree can have alot of values.
# Note using 0.14.1 because 0.14.3 is a bit different
(Dagger) pkg> status
Project Dagger v0.14.1
Status `E:\Programs\julia\.julia\dev\Dagger\Project.toml`using Dagger, Distributed
addprocs(; exeflags="--project");
@everywhereusing Dagger, Distributed
functiontestfun(n)
@time daggerres =collect(delayed(vcat)([delayed(+)(i,1) for i in1:n]...))
@time distres =pmap(i -> i+1, 1:n)
return daggerres, distres
end# First call we get compilation overhead for both Dagger and pmaptestfun(4)
# 16.981486 seconds (18.41 M allocations: 997.734 MiB, 3.62% gc time, 60.71% compilation time)# 2.639290 seconds (918.68 k allocations: 49.862 MiB, 0.77% gc time, 40.18% compilation time)# ([2, 3, 4, 5], [2, 3, 4, 5])# Now there is no compilation timetestfun(4)
# 0.034817 seconds (14.17 k allocations: 834.211 KiB)# 0.855902 seconds (695 allocations: 34.938 KiB)# ([2, 3, 4, 5], [2, 3, 4, 5])# But if we change the size the dagger version needs to be recompiled :(testfun(5)
# 0.583187 seconds (731.57 k allocations: 40.110 MiB, 5.19% gc time, 92.81% compilation time)# 0.024653 seconds (415 allocations: 22.359 KiB)#([2, 3, 4, 5, 6], [2, 3, 4, 5, 6])# And it scales kinda badlytestfun(1000);
# 14.335327 seconds (9.30 M allocations: 502.337 MiB, 1.45% gc time, 51.53% compilation time)# 0.342966 seconds (62.32 k allocations: 2.515 MiB)# Operation is really fast ofc, not sure why time though it was 50% abovetestfun(1000);
0.804885 seconds (2.19 M allocations:103.171 MiB, 7.19% gc time, 12.20% compilation time)
0.237457 seconds (62.56 k allocations:2.455 MiB)
testfun(2000);
# 76.037216 seconds (18.18 M allocations: 1019.668 MiB, 0.68% gc time, 97.74% compilation time)# 0.464703 seconds (124.33 k allocations: 4.741 MiB)testfun(2000);
# 1.996002 seconds (4.40 M allocations: 215.335 MiB, 13.17% gc time)# 0.875414 seconds (147.93 k allocations: 5.685 MiB)# So close, yet so far :)testfun(2001);
# 78.877148 seconds (18.27 M allocations: 1.001 GiB, 0.64% gc time, 97.54% compilation time)# 0.436966 seconds (160.21 k allocations: 6.884 MiB, 14.65% gc time)testfun(2001);
# 1.649533 seconds (4.36 M allocations: 206.494 MiB, 3.84% gc time)# 0.957735 seconds (147.96 k allocations: 5.424 MiB)
The text was updated successfully, but these errors were encountered:
It was a much bigger difference on the other machine I tried (the one used in shashi/FileTrees.jl#63 (comment)), so perhaps this is not something worth bothering about:
testfun(4);
# 19.902917 seconds (19.41 M allocations: 1.029 GiB, 2.37% gc time, 54.35% compilation time)# 2.539799 seconds (919.27 k allocations: 49.906 MiB, 1.09% gc time, 34.90% compilation time)testfun(4)
# 3.941361 seconds (39.94 k allocations: 2.120 MiB, 1.43% compilation time)# 0.805246 seconds (535 allocations: 29.375 KiB)testfun(1000);
# 11.674108 seconds (9.43 M allocations: 532.586 MiB, 1.67% gc time, 50.98% compilation time)# 0.743769 seconds (51.65 k allocations: 2.042 MiB)testfun(1000);
# 1.773273 seconds (1.70 M allocations: 99.279 MiB, 3.36% gc time)# 0.352158 seconds (61.14 k allocations: 2.329 MiB)testfun(2000);
# 71.023676 seconds (17.19 M allocations: 1011.429 MiB, 0.89% gc time, 96.29% compilation time)# 0.521013 seconds (113.83 k allocations: 4.296 MiB)testfun(2000);
# 3.602684 seconds (3.44 M allocations: 206.856 MiB, 7.15% gc time)# 0.498256 seconds (118.14 k allocations: 4.507 MiB)
Thanks to the glorious magic of SnoopCompile, I've found a ton of expensive inference triggers that are trivial to fix (mostly due to calling into broadcast). I'll try to crush as many as possible with this MWE, and then will post a PR. Thanks for the report and excellent reproducer!
As asked for in #331.
In
FileTree
s you start with aFileTree
where some nodes in the tree areThunk
s and we'd like to compute them in parallel. Current way is to collect them in an array and splat them, but this is not so good for performance as the tree can have alot of values.The text was updated successfully, but these errors were encountered: