Skip to content

Conversation

mhauru
Copy link
Member

@mhauru mhauru commented Jul 31, 2025

Work in progress, though most tests now pass. Current known issues are calling varargs functions and and more allocations than on v1.11.

@mhauru mhauru force-pushed the mhauru/julia1.12 branch from 2dfd832 to 0755577 Compare July 31, 2025 10:54
Copy link

github-actions bot commented Aug 5, 2025

Libtask.jl documentation for PR #196 is available at:
https://TuringLang.github.io/Libtask.jl/previews/PR196/

@mhauru
Copy link
Member Author

mhauru commented Aug 7, 2025

The test suite now passes except for the test cases called "nested with args (dynamic)" and "nested with args (dynamic + used)", which segfault due to JuliaLang/julia#59222. Waiting on a response there to see whether that's a Julia bug or something we need to adapt to.

@yebai
Copy link
Member

yebai commented Aug 7, 2025

JuliaLang/julia#59222

These tests look slightly weird. They might be violating opaque closure assumptions, e.g. dynamic dispatch of g(xs...) depends on inputs to the opaque closure itself. Do we need them here?

@yebai
Copy link
Member

yebai commented Aug 7, 2025

This works, it doesn't answer why JuliaLang/julia#59222 fails, though.

julia> g(xs::Tuple{Vararg{T}}) where T = 1

julia> (t::Tuple{typeof(g)})(xs::Tuple{Vararg{T}}) where T = t[1](xs)

julia> ir = Base.code_ircode_by_type(Tuple{Tuple{typeof(g)}, Tuple{Vararg{Symbol}}})[1][1]

julia> ir.argtypes[1] = Tuple{typeof(g)}

julia> oc = Core.OpaqueClosure(ir, g; isva=true, do_compile=true)

julia> oc(:a, :b)  # works on 1.12 and 1.11

EDIT: I suspect that g(xs...) is type unstable, though I still don't understand why that leads to a segfault on Julia 1.12 despite working fine on Julia 1.11.

@mhauru
Copy link
Member Author

mhauru commented Aug 8, 2025

The case where this comes up is if, in a TapedTask, you have a dynamic call to a varargs function that might have a produce statement. In that case Libtask.DynamicCallable calls it with the explicit arguments known at runtime, and hence the code_ircode_by_type call sees the concrete types rather than a Vararg type. I could maybe workaround that with some hacking of ir.argtypes. However, I couldn't figure out a nice way to do it, whereas the segfault seems to me like either Julia is bugging or something has changed about OCs that I need to understand.

Your comment about the callable being in the captures is good though, because it made me realise that I don't need that to replicate the problem. This, too, segfaults:

module MWE

g(xs...) = 1

ir = Base.code_ircode_by_type(Tuple{typeof(g), Symbol, Symbol})[1][1]
ir.argtypes[1] = Tuple{}

oc = Core.OpaqueClosure(ir; isva=true, do_compile=true)
oc(:a, :b)  # segfault on 1.12

end

@yebai
Copy link
Member

yebai commented Sep 18, 2025

Extra allocations might be due to: JuliaLang/julia#58780

@mhauru
Copy link
Member Author

mhauru commented Oct 9, 2025

With the workaround from JuliaLang/julia#59222 (comment) in place, tests now pass. However, something absolutely horrible has happened to performance and allocations. The benchmark suite is failing because it hasn't been updated to the new version of Turing (this happens on main too, not a problem of this PR), but the benchmarks that pass before it crashes show the following (run on my laptop):

On v1.11.7:

benchmarking rosenbrock...
  Run Original Function:  186.625 μs (24 allocations: 6.25 MiB)
  Run TapedTask: #produce=1;   7.161 ms (299155 allocations: 10.83 MiB)
benchmarking ackley...
  Run Original Function:  1.161 ms (0 allocations: 0 bytes)
  Run TapedTask: #produce=100000;   29.137 ms (899584 allocations: 21.36 MiB)
benchmarking matrix_test...
  Run Original Function:  94.958 μs (18 allocations: 576.47 KiB)
  Run TapedTask: #produce=1;   453.541 μs (546 allocations: 594.52 KiB)
benchmarking neural_net...
  Run Original Function:  444.444 ns (8 allocations: 576 bytes)
  Run TapedTask: #produce=1;   2.940 μs (54 allocations: 2.17 KiB)

On v1.12.0:

benchmarking rosenbrock...
  Run Original Function:  182.625 μs (24 allocations: 6.25 MiB)
  Run TapedTask: #produce=1;   3.261 s (15156390 allocations: 274.15 MiB)
benchmarking ackley...
  Run Original Function:  1.151 ms (0 allocations: 0 bytes)
  Run TapedTask: #produce=100000;   914.884 ms (4394003 allocations: 80.78 MiB)
benchmarking matrix_test...
  Run Original Function:  95.458 μs (18 allocations: 576.47 KiB)
  Run TapedTask: #produce=1;   137.134 ms (266275 allocations: 6.93 MiB)
benchmarking neural_net...
  Run Original Function:  437.712 ns (8 allocations: 576 bytes)
  Run TapedTask: #produce=1;   40.125 μs (226 allocations: 7.52 KiB)

That's a 20-1000 fold slowdown. This makes Libtask essentially useless on v1.12. These benchmarks also pass, and show similar results, without the aforementioned workaround. Note that the fix for JuliaLang/julia#58780 is in 1.12.0, that should no longer be the issue.

I'll see if I can boil this down to a simple example and use that to ask for advice from Julia devs.

@yebai
Copy link
Member

yebai commented Oct 9, 2025

This might be the cause: chalk-lab/Mooncake.jl#714 (comment)

@mhauru
Copy link
Member Author

mhauru commented Oct 9, 2025

Here's a quick snippet to check this:

module MWE

using Libtask, BenchmarkTools

function f(x)
    i = x .+ 1
    j = x .- 1
    ret = (i - j .^ 2)
    produce(ret)
    return ret
end

function wrap(x)
    tt = TapedTask(nothing, f, x)
    consume(tt)
    return nothing
end

@btime wrap(randn(100))

end

On 1.11.7 32.166 μs (394 allocations: 17.55 KiB), on 1.12.0 1.576 ms (4207 allocations: 115.39 KiB).

Since @serenity4 is working on a fix that might help, I'll wait for that before putting more effort into figuring out what is going on here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants