ENH: `lazy_apply` #86

crusaderky · 2025-01-09T16:30:00Z

Wrapper around jax.apply_pure_callback with added support for Dask.

.pre-commit-config.yaml

src/array_api_extra/_apply.py

crusaderky · 2025-01-09T16:35:23Z

src/array_api_extra/_apply.py

+        ``core_indices`` is a safety measure to prevent incorrect results on
+        Dask along chunked axes. Consider this::


This design was informed from https://docs.xarray.dev/en/latest/generated/xarray.apply_ufunc.html

crusaderky · 2025-01-09T16:37:09Z

src/array_api_extra/_apply.py

+        The dask graph won't be computed. As a special limitation, `func` must return
+        exactly one output.


This limitation is straightforward to fix in Dask (at the cost of API duplication).
Until then, however, I suspect it will be a major roadblock for Dask support in scipy.

It can be also hacked outside of dask but I'm a hesitant to do that for the sake of robustness, as it would rely on deliberately triggering key collisions between diverging graph branches.

crusaderky · 2025-01-09T16:37:56Z

src/array_api_extra/_apply.py

+        `input_indices`, `output_indices`, and `core_indices`, but you may also need
+        `adjust_chunks` and `new_axes` depending on the function.
+
+        Read `dask.array.blockwise`:


https://docs.dask.org/en/stable/generated/dask.array.blockwise.html

crusaderky · 2025-01-09T16:38:53Z

src/array_api_extra/_apply.py

+        - ``output_indices[0]`` maps to the ``out_ind`` parameter
+        - ``adjust_chunks[0]`` maps to the ``adjust_chunks`` parameter
+        - ``new_axes[0]`` maps to the ``new_axes`` parameter


These are all lists for forward-compatibility to a yet-to-be-written da.blockwise variant that supports multiple outputs.

crusaderky · 2025-01-09T16:50:43Z

src/array_api_extra/_apply.py

+        If `func` returns a single (non-sequence) output, this must be a sequence
+        with a single element.


I tried overloading but it was a big headache when validating inputs. I found this approach much simpler.

lucascolley

I think I am missing some context here. Why are we wrapping arbitrary NumPy functions? Instead of, e.g., considering individual functions which we need one-by-one.

.pre-commit-config.yaml

src/array_api_extra/_apply.py

crusaderky · 2025-01-09T17:54:59Z

I think I am missing some context here. Why are we wrapping arbitrary NumPy functions? Instead of, e.g., considering individual functions which we need one-by-one.

There are many points in scipy that look like this:

x = np.asarray(x)
y = np.asarray(y)
z = some_cython_kernel(x, y)
z = xp.asarray(z)

None of them will ever work with arrays on GPU devices, of course, and they'll either need a pure-array-api alternative or a dispatch to cupy.scipy etc.

None of them work with jitted, cpu-based JAX either, because jax.jit doesn't support abrupt materialization of the graph on np.asarray in the middle of jitting. Notably, this differs from torch.compile, which does exactly that.

With Dask, they technically work because there is no materialization guard but most times you would prefer it if there was one.
In the best case, they will be exceptionally slow to run once you reach production-sized data: if there are multiple inputs, chances are that large parts of the graph will be computed multiple times, and they trigger massive transfers between client and workers - which are very likely to kill off the client and the scheduler too (at least in the default configuration direct_to_workers=False where all client<->worker traffic transits through the scheduler).

However, it is possible to make these pieces of Cython code work thanks to this PR.
For JAX, this is straightforward - all you need to know is the output shape(s) and dtype(s).
For Dask, it is, well, the very opposite of straightforward. You can run arbitrary cython kernel in dask, on the workers, but with the very big caveat that any axis they reduce upon can't be chunked or you'll get incorrect behaviour, as explained in the docstring. Additionally, Dask needs to know, in order to function, how each axis of the input maps to each axis of the output and, if the size along each axis changes in any way, how that translates to chunk sizes.

There are two competing functions in Dask to achieve this, with different API:

map_blocks is a variant of blockwise with simplified API, which can only work on broadcastable inputs.

This problem has already been dealt with by xarray, with https://docs.xarray.dev/en/latest/generated/xarray.apply_ufunc.html. Note that xarray API is more user-friendly thanks to each dimension being labelled at all times, so apply_ufunc can do fancy tricks like auto-transposing the inputs and pushing the dimensions that func doesn't know about to the left.

What I tried to implement here is equivalent to xarray.apply_ufunc(..., dask="parallelized"), which under the hood calls da.blockwise.
dask="allowed" makes no sense without an ulterior wrapper around the numpy-like API of dask.

lucascolley · 2025-01-09T18:26:40Z

Okay, thanks. When you say

any Array API compliant arrays

in the docstring, that isn't strictly true, right? It relies on np.asarray working on xp arrays and xp.asarray working on np arrays, if I have read the code correctly.

I understand the utility of this PR now.

what's your vision for it?

I hadn't envisioned tackling this yet. In my mind getting Cython kernels working with dask/jax jit has been in the same "for later" basket as handling device transfers or writing new implementations to delegate to. But if the implementation works, makes sense to tackle it.

crusaderky · 2025-01-09T18:47:06Z

Okay, thanks. When you say

any Array API compliant arrays

in the docstring, that isn't strictly true, right? It relies on np.asarray working on xp arrays and xp.asarray working on np arrays, if I have read the code correctly.

Correct. Nominally it will fail when densifying sparse arrays and moving data from GPU to CPU. A final user can however force their way through, if they want to, by deliberately suppressing transfer/densification guards for the time necessary to run the scipy function. Either that, or do an explicit device to cpu transfer / to_dense() call ahead of the function.

There is nothing however a jax or dask user can do today, short of completely getting out of the graph generation phase.

rgommers

Thanks @crusaderky!

The enormous amount of extra complication needed to make it work with Dask makes me uncomfortable.

Yes indeed, it does. That looks like it's way too much. I don't think we would like to use all those extra keywords and Dask-specific functions in SciPy. If you'd drop those, does it make Dask completely non-working or is there a subset of functionality that would still work. I'd say that JAX shows that it can be straightforward, and a similar callback mechanism could be used for PyTorch/MLX/ndonnx as well - if that were to exist in those libraries.

src/array_api_extra/_apply.py

rgommers · 2025-01-10T10:36:23Z

src/array_api_extra/_apply.py

+    Sparse
+        By default, sparse prevents implicit densification through ``np.asarray`.
+        `This safety mechanism can be disabled
+        <https://sparse.pydata.org/en/stable/operations.html#package-configuration>`_.


Fine to leave as is for now I'd say. Once sparse adds a better API for this (an env var doesn't work), it seems reasonable to add a force=False option to this function. There are various reasons why one may want to force an expensive conversion; that kind of thing should always be opt-in on a case-by-case basis.

Shouldn't this be tackled by a more general design pattern?

with disable_guards(): y = scipy.somefunc(x)

where disable_guards is backend-specific.
Applies to torch/cupy/jax arrays on GPU, sparse arrays, etc.

rgommers · 2025-01-10T10:42:22Z

Do you have a SciPy branch with this function being used @crusaderky? I'd be interested in playing with it.

crusaderky · 2025-01-10T12:01:50Z

If you'd drop those, does it make Dask completely non-working or is there a subset of functionality that would still work

I could make it work by rechunking all the inputs to a single chunk. In other words the whole calculation would need to fit in memory at once on a single worker.

crusaderky · 2025-01-10T12:02:07Z

Do you have a SciPy branch with this function being used @crusaderky? I'd be interested in playing with it.

Not yet

crusaderky · 2025-01-10T14:09:17Z

I could make it work by rechunking all the inputs to a single chunk. In other words the whole calculation would need to fit in memory at once on a single worker.

@rgommers I rewrote it to do exactly this and now it's a lot cleaner. I'll keep my eyes open if I can see patterns in scipy we can leverage to improve Dask support (e.g. if there is there are frequent elementwise functions that could be trivially served by map_blocks)

lucascolley · 2025-01-14T16:26:17Z

@allcontributors, please add @crusaderky for bug

let me just try this once from this PR...

rgommers

I haven't tried to test this, but did go through it in a bit more detail now - overall looks good, a few comments. Looking forward to trying it out!

src/array_api_extra/_apply.py

src/array_api_extra/_lib/_compat.pyi

crusaderky · 2025-01-15T21:51:29Z

I've reworked the design a bit.

Renamed the function to apply_lazy and added a as_numpy=False optional parameter. This allows for
- array API compliant eager functions to be wrapped by Dask and applied to Dask arrays with non-numpy _.meta , and
- eager-only JAX operations (e.g. with output size that's not predictable) to be executed on lazy arrays, which is particularly beneficial for GPU
Added support for unknown output size in Dask and eager JAX. This allows Dask support for functions such as scipy.cluster.leaders.

FYI, when I move _lazywhere from scipy, I intend to call it apply_where, which I think makes a good pair with apply_lazy as it conveys that they're both about applying a callback.

crusaderky · 2025-01-15T21:55:44Z

src/array_api_extra/_lib/_apply.py

+        if any(s is None for shape in shapes for s in shape):
+            # Unknown output shape. Won't work with jax.jit, but it
+            # can work with eager jax.
+            # Raises jax.errors.TracerArrayConversionError if we're inside jax.jit.


Offline conversation:

@crusaderky:

how do you see scipy functions with unknown output size work with jax.jit? e.g. scipy.cluster.leaders? Should we do like in jnp.unique_all and add a size=None optional parameter, which becomes mandatory when the jit is on?

@rgommers:

I'm not sure that we should add support for those functions. My assumption is that it's only a few functions, and that those are inherently pretty clunky with JAX. I don't really want to think about extending the signatures (yet at least), because the current jax.jit support is experimental and behind a flag, and adding keywords is public and not reversible.
Perhaps making a note on the tracking issue about this being an option, but not done because of the reason above (could be done in the future, if JAX usage takes off)?

If in the future we want to support these functions, we'll have to modify this point to catch jax.errors.TracerArrayConversionError and reraise a backend-agnostic exception, so that scipy.cluster.leaders and similar can then catch it and reraise an informative error message about size= being mandatory.

scipy.cluster.leaders is a function which in a nearby future will work in eager JAX but can't work in jax.jit short of a public API change, because its output arrays' shape is xp.unique_values(input).shape.

@pearu @vfdev-5 a while ago you asked offline how can we run inside jax.jit a function such as this.
It will be possible for an end user to call do so, at the condition that they consume its output and revert to a known shape, for example:

import array_api_extra as xpx from scipy.cluster import leaders def _eager(x): a, b = leaders(x) # shapes = (None, ), (None, ) xp = array_namespace(a, b) # silly example; probably won't make sense functionally return xp.max(a), xp.max(b) # This is just an example that makes little sense; # in practice @jax.jit will be much higher in the call stack @jax.jit def f(x): return xpx.lazy_apply( _eager, x, shape=((), ()), dtype=(x.dtype, x.dtype)) )

src/array_api_extra/_lib/_apply.py

crusaderky · 2025-01-16T18:16:28Z

I think I want to have some evidence that the whole thing works in practice before I finalize this PR.
See scipy/scipy#22342

rgommers

Thanks @crusaderky, and apologies for the long delay. I think this is good to go - a few questions left from me only. I think we should merge this, then update scipy/scipy#22342 and see how we like that PR.

docs/api-lazy.md

src/array_api_extra/_lib/_lazy.py

lucascolley

I skimmed the diff quickly—happy to merge given Ralf's approval!

src/array_api_extra/_lib/_lazy.py

Co-authored-by: Ralf Gommers <[email protected]>

crusaderky · 2025-03-18T13:43:29Z

I've manged to avoid having arrays in the kwargs in scipy.
I've completely removed the recursion into kwargs here, and I added support for pure-python and None args to compensate.
It looks a lot cleaner and more robust now.

rgommers

Both the JAX and the Dask code paths look clean and understandable now - very nice!. No more comments, let's ship it I'd say!

rgommers · 2025-03-18T20:27:11Z

Ah there is a merge conflict

lucascolley · 2025-03-18T22:14:42Z

release inbound

rgommers · 2025-03-19T03:40:07Z

Thanks Guido & Lucas!

* ENH: New function `lazy_apply` * Update docs/api-lazy.md Co-authored-by: Ralf Gommers <[email protected]> * Update src/array_api_extra/_lib/_lazy.py Co-authored-by: Ralf Gommers <[email protected]> * Code review * Remove kwargs introspection; support None | complex args * Don't always import numpy * update lockfile * appease mypy --------- Co-authored-by: Ralf Gommers <[email protected]> Co-authored-by: Lucas Colley <[email protected]>

crusaderky commented Jan 9, 2025

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

crusaderky commented Jan 9, 2025

View reviewed changes

src/array_api_extra/_apply.py Outdated Show resolved Hide resolved

crusaderky commented Jan 9, 2025

View reviewed changes

lucascolley reviewed Jan 9, 2025

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

src/array_api_extra/_apply.py Outdated Show resolved Hide resolved

rgommers reviewed Jan 10, 2025

View reviewed changes

crusaderky force-pushed the apply branch from 8389c02 to 820043e Compare January 10, 2025 14:07

crusaderky force-pushed the apply branch from 59b463b to 1fec399 Compare January 10, 2025 14:49

lucascolley marked this pull request as draft January 13, 2025 15:50

lucascolley added enhancement New feature or request new function labels Jan 13, 2025

rgommers reviewed Jan 14, 2025

View reviewed changes

crusaderky commented Jan 15, 2025

View reviewed changes

src/array_api_extra/_lib/_apply.py Outdated Show resolved Hide resolved

lucascolley changed the title ~~WIP apply_numpy_func~~ WIP apply_lazy Jan 15, 2025

crusaderky mentioned this pull request Jan 16, 2025

ENH: cluster: linkage support for jax.jit and dask scipy/scipy#22342

Merged

crusaderky mentioned this pull request Jan 17, 2025

DNM WIP ENH: More lazy functions #99

Closed

crusaderky force-pushed the apply branch 4 times, most recently from 39b367c to 2e28881 Compare March 5, 2025 18:14

crusaderky force-pushed the apply branch 4 times, most recently from a8827ce to 6f26ebb Compare March 17, 2025 15:50

ENH: New function lazy_apply

20bf94e

crusaderky force-pushed the apply branch from 6f26ebb to 20bf94e Compare March 17, 2025 18:21

rgommers reviewed Mar 18, 2025

View reviewed changes

docs/api-lazy.md Outdated Show resolved Hide resolved

src/array_api_extra/_lib/_lazy.py Outdated Show resolved Hide resolved

src/array_api_extra/_lib/_lazy.py Outdated Show resolved Hide resolved

src/array_api_extra/_lib/_lazy.py Outdated Show resolved Hide resolved

lucascolley reviewed Mar 18, 2025

View reviewed changes

src/array_api_extra/_lib/_lazy.py Outdated Show resolved Hide resolved

lucascolley added this to the 0.7.0 milestone Mar 18, 2025

crusaderky and others added 4 commits March 18, 2025 12:10

Update docs/api-lazy.md

f00e5f7

Co-authored-by: Ralf Gommers <[email protected]>

Update src/array_api_extra/_lib/_lazy.py

dc00bfd

Co-authored-by: Ralf Gommers <[email protected]>

Code review

04c3e1a

Remove kwargs introspection; support None | complex args

392aac0

crusaderky added 2 commits March 18, 2025 15:10

Don't always import numpy

514b9f7

Merge branch 'main' into apply

13181a6

rgommers approved these changes Mar 18, 2025

View reviewed changes

lucascolley added 3 commits March 18, 2025 21:54

Merge branch 'main' into apply

8c89b4d

update lockfile

5041517

appease mypy

62634e0

lucascolley merged commit 87fe423 into data-apis:main Mar 18, 2025
10 checks passed

crusaderky deleted the apply branch March 18, 2025 23:15

		``core_indices`` is a safety measure to prevent incorrect results on
		Dask along chunked axes. Consider this::

		The dask graph won't be computed. As a special limitation, `func` must return
		exactly one output.

		If `func` returns a single (non-sequence) output, this must be a sequence
		with a single element.

ENH: lazy_apply #86

ENH: lazy_apply #86

Uh oh!

Conversation

crusaderky commented Jan 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

crusaderky Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

crusaderky Jan 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crusaderky Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

crusaderky Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

crusaderky Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

lucascolley left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

crusaderky commented Jan 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucascolley commented Jan 9, 2025

Uh oh!

crusaderky commented Jan 9, 2025

Uh oh!

rgommers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rgommers Jan 10, 2025

Choose a reason for hiding this comment

Uh oh!

crusaderky Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rgommers commented Jan 10, 2025

Uh oh!

crusaderky commented Jan 10, 2025

Uh oh!

crusaderky commented Jan 10, 2025

Uh oh!

crusaderky commented Jan 10, 2025

Uh oh!

lucascolley commented Jan 14, 2025

Uh oh!

rgommers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

crusaderky commented Jan 15, 2025

Uh oh!

crusaderky Jan 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crusaderky Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

crusaderky commented Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ENH: `lazy_apply` #86

ENH: `lazy_apply` #86

crusaderky commented Jan 9, 2025 •

edited

Loading

crusaderky Jan 9, 2025 •

edited

Loading

crusaderky commented Jan 9, 2025 •

edited

Loading

crusaderky Jan 14, 2025 •

edited

Loading

crusaderky Jan 15, 2025 •

edited

Loading

crusaderky Jan 22, 2025 •

edited

Loading

crusaderky commented Jan 16, 2025 •

edited

Loading

crusaderky commented Mar 18, 2025 •

edited

Loading