Use tensor accessors in C++ to avoid going through the dispatcher #281

NicolasHug · 2024-10-22T11:40:11Z

https://pytorch.org/cppdocs/notes/tensor_basics.html#efficient-access-to-tensor-elements

When using Tensor-wide operations, the relative cost of dynamic dispatch is very small. However, there are cases, especially in your own kernels, where efficient element-wise access is needed, and the cost of dynamic dispatch inside the element-wise loop is very high. ATen provides accessors that are created with a single dynamic check that a Tensor is the type and number of dimensions. Accessors then expose an API for accessing the Tensor elements efficiently.

We should use these accessors whenever possible, typically when we do stuff like output.ptsSeconds[f] = singleOut.ptsSeconds.
I don't think it's going to lead to some crazy speedup, because we're usually not decoding enough frames for the dispatcher cost to be visible (I suspect).

On previous work (torchvision image decoders, torch.nn.interpolate C++ implementation), I did observe that using accessors instead of plain indexing led to crazy speedups (but that was on every single pixel values, which are order of magnitude bigger than the number of frames).

Anyway, we should probably still do it, since it's good practice.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use tensor accessors in C++ to avoid going through the dispatcher #281

Use tensor accessors in C++ to avoid going through the dispatcher #281

NicolasHug commented Oct 22, 2024

Use tensor accessors in C++ to avoid going through the dispatcher #281

Use tensor accessors in C++ to avoid going through the dispatcher #281

Comments

NicolasHug commented Oct 22, 2024