Skip to content

Conversation

obenjiro
Copy link

@obenjiro obenjiro commented Aug 19, 2025

Summary

  • implement stride-based reshape for faster nested array construction
  • add tensor tests for 4D tolist

Fixes #1191

The reshape-based approach

The old implementation built the nested structure through a series of reshapes:

  • For each dimension, the flat array was chunked into smaller arrays.
  • Each chunk was then reshaped again for the next dimension.
  • This meant multiple passes over the data: one to create the first level, another for the second, and so on.

Each reshape step:

  • Allocates new arrays for every chunk.
  • Invokes slice or map operations, producing copies or calling callbacks for every element.
  • Re-reads the same data multiple times. For an n‑dimensional tensor, the data might be iterated n separate times.
  • Incurs significant JavaScript function‑call overhead, because reduce, map, and friends invoke callbacks for each element.

This results in substantial CPU time (multiple passes and callback dispatches) and memory churn (allocating and discarding intermediate arrays). The complexity effectively grows with both the number of elements and the number of dimensions.

Recursive construction with precomputed strides

The optimized version precomputes the strides—the number of elements to skip along each dimension—and builds the nested array recursively:

  • Starting at dimension 0, allocate the output array for that level.
  • For each index at this level, compute the offset into the flat buffer using the stride.
  • Recurse into the next dimension until the innermost dimension is reached, where elements are read directly from the typed array.
Shape Current tolist (ms) Optimized tolist (ms) Speedup
[16, 768] 1.003 0.319 3.15×
[32, 768] 1.948 0.187 10.42×
[8, 16, 64] 0.645 0.049 13.14×
[8, 32, 64] 1.792 0.259 6.92×
[4, 8, 16, 32] 1.797 0.096 18.73×
[2, 4, 8, 16, 32] 3.537 0.203 17.43×

@obenjiro
Copy link
Author

@xenova sorry for bothering you. Would you kindly look into this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speeding up Tensor.tolist()
1 participant