Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tensor accessors in C++ to avoid going through the dispatcher #281

Open
NicolasHug opened this issue Oct 22, 2024 · 0 comments
Open

Comments

@NicolasHug
Copy link
Member

https://pytorch.org/cppdocs/notes/tensor_basics.html#efficient-access-to-tensor-elements

When using Tensor-wide operations, the relative cost of dynamic dispatch is very small. However, there are cases, especially in your own kernels, where efficient element-wise access is needed, and the cost of dynamic dispatch inside the element-wise loop is very high. ATen provides accessors that are created with a single dynamic check that a Tensor is the type and number of dimensions. Accessors then expose an API for accessing the Tensor elements efficiently.

We should use these accessors whenever possible, typically when we do stuff like output.ptsSeconds[f] = singleOut.ptsSeconds.
I don't think it's going to lead to some crazy speedup, because we're usually not decoding enough frames for the dispatcher cost to be visible (I suspect).

On previous work (torchvision image decoders, torch.nn.interpolate C++ implementation), I did observe that using accessors instead of plain indexing led to crazy speedups (but that was on every single pixel values, which are order of magnitude bigger than the number of frames).

Anyway, we should probably still do it, since it's good practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant