Skip to content

Commit ff30a26

Browse files
ikrommydpfackeldeypre-commit-ci[bot]iannalgray
authoredMar 19, 2025··
feat: implement virtual arrays (#3364)
* first try to implement virtual arrays at (least close to) the buffer level * lint * prepare virtual arrays for Iason (now really at the buffer level) * fix .T for virtual arrays * minimal changes to get an example of a listoffsetarray with virtual buffers working in a notebook * what if we start like this? * add repr * restore _data <-> data to original * an attempt at implementing some methods * style: pre-commit fixes * tiny changes * add ones_like and full_like to play around a bit * no need to implement ufuncs separately. They are covered by apply_ufunc * add materialization in array_module.py * same repr for placeholders * adhere to ArrayLike interface * add TODO comment * try kernels * never create new virtual arrays for now, let them materialize * ruff fix * bring back creation of virtual arrays with a local cache * forgot this * maybe rapidjson submodule fix * adjust _get_item_at_placeholder for virtual arrays as well * new _get_item_at_virtual function and different VirtualValue vs PlaceholderValue * remove some comments and docstrings that we decided to use in a different manner * let iterators of virtual buffers materialize * inherit from NDArrayOperatorsMixin as well * first attempt at cupy nplike for virtual arrays * we should not instantiate cupy for everyone :) * I wanna try like this first * add preliminary implementations of ak.materialize and ak.dematerialize * duplicate import * whoops * delete dematerialize * make views return VirtualArray's only * numpy metadata are unused * we need .ctypes * better use classes instead of strings * materialize should keep the parameters in the new NumpyArray * add ak.from_virtual * remove extra length argument from from_virtual * use ak.materialize inside to_something calls * docstring for materialize * add comment to not forget * some changs to contiguousness and backends * forgot a breakpoint * Do not use recursively_apply for ak.materialize. Write our own recursive function * add is_materialized property to layouts * is_materialized it should work with records to * some touches * ruff fix * update ak.from_virtual * style: pre-commit fixes * allow unknown shape slices in virtual arrays * remove ak.from_virtual; make ak.from_buffers compatible with VirtualArrays * style: pre-commit fixes * remove ak.from_virtual * drop support for unknown_length * style: pre-commit fixes * make reprs of placeholder and virtual consistent * better make those typeerros * make ak.from_buffers materialize offsets and then free the memory again immediately * add tolist() to typetracer array * remove form_key from virtual arrays * address some comments * add is_any_materialized and is_all_materialized * fix integer check * was this a bug all along? The .data are Virtual/Placeholder, not the masks/index * pre-commit * similar errors for both nplikes * if it's virtual, the virtual.nplike.ndarray is going to go in there so it's fine * pre-commit fixes * remove from_virtual from docs * recursive from_buffers for virtual arrays * raise error if non-fully materialized arraystry to get into an arraybuilder kernel * add class VirtualArray tests * specify dtypes in the test * maybe this fixes windows tests? * array module tests * we should be able to convert virtual to typetracer * to_rdataframe should materialize * allow changing dtype of virtual arrays in asarray * style: pre-commit fixes * mypy fix * .max() will not work for virtual arrays * numpyarray tests * listoffsetarray tests * listarray tests * pre-commit fixes * recordarray tests * add copying * fix pylint? * reset _is_getitem_at_placeholder * Do not use XX for placeholders * remove tolist from array_like * address comments regarding virtual array un-materialization during ak.from_buffers * materialize_if_virtual is not needed in *_like(..) operations * add dlpack stuff * mypy * skip dlpack if numpy is old * reset ak_to_something.py functions to main * revert array_module typing changes as close to original typing as possible * fix import * array_str typo * maybe make ak_to_something operations work? * fix the repr test * what if raw materialized? * Revert "what if raw materialized?" This reverts commit 6168ff9. * Reapply "what if raw materialized?" This reverts commit c7eb280. * what if we do this? * make frombuffer errors consistent with main * rename unmaterialize to dematerialize * add internal docstrings and comments * document from_buffer changes * fix english * fix test * style: pre-commit fixes * change if to elif * address some comments * directly test _array for internal methods * add supports_virtual_arrays to nplikes * make native_to_byteorder, do not let raw materialize, attempt to fix to_cudf, to_arrow, to_backendarray * a few more places where we need to materialize raw's output * some comments from Angus * calculate nbytes as size * itemsize --------- Co-authored-by: pfackeldey <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ianna Osborne <[email protected]> Co-authored-by: Lindsey Gray <[email protected]>
1 parent 703533b commit ff30a26

36 files changed

+7359
-53
lines changed
 

‎docs/reference/toctree.txt

+5
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,11 @@
7373
generated/ak.to_packed
7474
generated/ak.copy
7575

76+
.. toctree::
77+
:caption: Materializing virtual arrays
78+
79+
generated/ak.materialize
80+
7681
.. toctree::
7782
:caption: Validity checking
7883

‎src/awkward/_backends/dispatch.py

+22
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,11 @@
44

55
from collections.abc import Collection
66

7+
import awkward as ak
78
from awkward._backends.backend import Backend
89
from awkward._nplikes.numpy import Numpy
910
from awkward._nplikes.numpy_like import NumpyLike, NumpyMetadata
11+
from awkward._nplikes.virtual import VirtualArray
1012
from awkward._typing import Callable, TypeAlias, TypeVar, cast
1113
from awkward._util import UNSET, Sentinel
1214

@@ -70,6 +72,7 @@ def common_backend(backends: Collection[Backend]) -> Backend:
7072

7173

7274
def backend_of_obj(obj, default: D | Sentinel = UNSET) -> Backend | D:
75+
# the backend of virtual arrays will be determined via the `find_virtual_backend` lookup
7376
cls = type(obj)
7477
try:
7578
lookup = _type_to_backend_lookup[cls]
@@ -129,3 +132,22 @@ def regularize_backend(backend: str | Backend) -> Backend:
129132
return _name_to_backend_cls[backend].instance()
130133
else:
131134
raise ValueError(f"No such backend {backend!r} exists.")
135+
136+
137+
@register_backend_lookup_factory
138+
def find_virtual_backend(obj: type):
139+
"""
140+
Implements a lookup for finding the backends of virtual arrays.
141+
This is necessary to avoid calling `isinstance` inside `backend_of_obj` which may cause slowdown.
142+
"""
143+
if issubclass(obj, VirtualArray):
144+
145+
def finder(obj: VirtualArray):
146+
if isinstance(obj.nplike, ak._nplikes.numpy.Numpy):
147+
return _name_to_backend_cls["cpu"].instance()
148+
elif isinstance(obj.nplike, ak._nplikes.cupy.Cupy):
149+
return _name_to_backend_cls["cuda"].instance()
150+
else:
151+
raise TypeError("A virtual array can only have numpy or cupy backends")
152+
153+
return finder

‎src/awkward/_kernels.py

+5
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
from awkward._nplikes.numpy import Numpy
1313
from awkward._nplikes.numpy_like import NumpyMetadata
1414
from awkward._nplikes.typetracer import try_touch_data
15+
from awkward._nplikes.virtual import materialize_if_virtual
1516
from awkward._typing import Protocol, TypeAlias
1617

1718
KernelKeyType: TypeAlias = tuple # Tuple[str, Unpack[Tuple[metadata.dtype, ...]]]
@@ -88,6 +89,8 @@ def _cast(cls, x, t):
8889
def __call__(self, *args) -> None:
8990
assert len(args) == len(self._impl.argtypes)
9091

92+
args = materialize_if_virtual(*args)
93+
9194
return self._impl(
9295
*(self._cast(x, t) for x, t in zip(args, self._impl.argtypes))
9396
)
@@ -138,6 +141,8 @@ def _cast(self, x, type_):
138141
def __call__(self, *args) -> None:
139142
import awkward._connect.cuda as ak_cuda
140143

144+
args = materialize_if_virtual(*args)
145+
141146
cupy = ak_cuda.import_cupy("Awkward Arrays with CUDA")
142147
maxlength = self.max_length(args)
143148
grid, blocks = self.calc_grid(maxlength), self.calc_blocks(maxlength)

‎src/awkward/_lookup.py

+5
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,11 @@ def arrayptr(x):
3030

3131

3232
def tolookup(layout, positions):
33+
if not layout.is_all_materialized:
34+
raise TypeError(
35+
"Only fully materialized arrays can be passed into lookups. Use ak.materialize to materialize the array before passing it to the kernel."
36+
)
37+
3338
if isinstance(layout, ak.contents.EmptyArray):
3439
return tolookup(layout.to_NumpyArray(np.dtype(np.float64)), positions)
3540

‎src/awkward/_nplikes/__init__.py

+19
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
import awkward._nplikes.jax
77
import awkward._nplikes.numpy
88
import awkward._nplikes.typetracer
9+
import awkward._nplikes.virtual
910
from awkward._nplikes.dispatch import nplike_of_obj
1011
from awkward._typing import TYPE_CHECKING
1112

@@ -27,6 +28,24 @@ def to_nplike(
2728
if from_nplike is nplike:
2829
return array
2930

31+
# We can always convert virtual arrays to typetracers
32+
# but can only convert virtual arrays to other backends with known data if they are intentionally materialized
33+
# Only numpy and cupy nplikes are allowed for virtual arrays
34+
if isinstance(array, awkward._nplikes.virtual.VirtualArray):
35+
if not array.is_materialized and nplike.known_data:
36+
raise TypeError(
37+
"Cannot convert a VirtualArray to a different nplike with known data without materializing it first. Use ak.materialize on the array to do so."
38+
)
39+
else:
40+
if nplike.supports_virtual_arrays:
41+
array = array.materialize()
42+
elif not nplike.known_data:
43+
pass
44+
else:
45+
raise TypeError(
46+
f"Can only convert a VirtualArray to numpy, cupy or typetracer nplikes. Received {type(nplike)}"
47+
)
48+
3049
if nplike.known_data and not from_nplike.known_data:
3150
raise TypeError(
3251
"Converting from an nplike without known data to an nplike with known data is not supported"

0 commit comments

Comments
 (0)