-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Fix int32 overflow issues for large tensor support in paddle/phi/kernels/impl #76107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates tensor dimension and indexing calculations from int to int64_t to support large tensors that exceed INT32_MAX limits. The changes include adding runtime validation checks where legacy implementations still use int internally, and fixing potential integer overflow issues in CUDA kernel index calculations.
- Type promotions from
inttoint64_tfor tensor dimensions, batch sizes, and element counts - Addition of runtime checks with TODO comments where underlying implementations still use
int - CUDA kernel index calculation fixes to prevent overflow with explicit casts to
int64_t
Reviewed Changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| paddle/phi/kernels/impl/unstack_kernel_impl.h | Changed total_num and post to int64_t, added validation for StackGradFunctorForRange int limitation |
| paddle/phi/kernels/impl/unfold_kernel_impl.h | Changed batch_size and loop variable to int64_t |
| paddle/phi/kernels/impl/unfold_grad_kernel_impl.h | Changed batch_size and loop variable to int64_t |
| paddle/phi/kernels/impl/svdvals_grad_kernel_impl.h | Changed rows, cols, and batches to int64_t, removed static_cast |
| paddle/phi/kernels/impl/svd_grad_kernel_impl.h | Changed helper function parameters and dimension variables to int64_t |
| paddle/phi/kernels/impl/stft_kernel_impl.h | Changed n_frames and seq_length from int to size_t |
| paddle/phi/kernels/impl/stft_grad_kernel_impl.h | Changed n_frames and seq_length from int to size_t |
| paddle/phi/kernels/impl/spectral_norm_kernel_impl.h | Changed h and w dimension variables to int64_t |
| paddle/phi/kernels/impl/spectral_norm_grad_kernel_impl.h | Changed h and w dimension variables to int64_t |
| paddle/phi/kernels/impl/renorm_impl.h | Changed grid and grid2 to int64_t, added max grid size checks, fixed kernel parameter |
| paddle/phi/kernels/impl/qr_grad_kernel_impl.h | Changed m and n dimensions to int64_t |
| paddle/phi/kernels/impl/lstsq_kernel_impl.h | Changed m and n to int64_t, added explanatory comment |
| paddle/phi/kernels/impl/lstm_kernel_impl.h | Changed frame_size to int64_t, removed static_cast |
| paddle/phi/kernels/impl/lrn_kernel_impl.h | Changed N, C, H, W to int64_t, added validation and include for std::max, updated functor signature |
| paddle/phi/kernels/impl/kldiv_loss_kernel_impl.h | Changed n to int64_t |
| paddle/phi/kernels/impl/kldiv_loss_grad_kernel_impl.h | Changed numel and expand to int64_t |
| paddle/phi/kernels/impl/isclose_kernel_impl.h | Fixed CUDA index calculations with explicit int64_t/unsigned int casts |
| paddle/phi/kernels/impl/gumbel_softmax_kernel_impl.h | Changed axis_dim to int64_t, added validation and iostream include |
| paddle/phi/kernels/impl/gumbel_softmax_grad_kernel_impl.h | Changed axis_dim to int64_t, added validation |
| paddle/phi/kernels/impl/frame_kernel_impl.h | Changed n_frames and seq_length to int64_t |
| paddle/phi/kernels/impl/frame_grad_kernel_impl.h | Changed n_frames and seq_length to int64_t |
| paddle/phi/kernels/impl/fold_kernel_impl.h | Changed batch_size, n_input_plane, n_output_plane and loop variable to int64_t |
| paddle/phi/kernels/impl/fold_grad_kernel_impl.h | Changed batch_size, n_input_plane, n_output_plane and loop variable to int64_t |
| paddle/phi/kernels/impl/elementwise_grad_kernel_impl.h | Changed loop variables and kernel parameters to int64_t, fixed CUDA index calculations |
| paddle/phi/kernels/impl/conv_kernel_impl.h | Changed batch_size, in_step, out_step and loop variable to int64_t |
| paddle/phi/kernels/impl/conv_grad_kernel_impl.h | Changed batch_size, in_step, out_step and loop variables to int64_t |
| paddle/phi/kernels/impl/accuracy_check_kernel_impl.h | Changed kernel parameters and variables to int64_t, fixed CUDA index calculations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| total_num, | ||
| std::numeric_limits<int>::max(), | ||
| common::errors::InvalidArgument( | ||
| "The total number of elements in UnStack is %d, which exceeds the " |
Copilot
AI
Oct 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format specifier %d is used for an int64_t value total_num. This should be %ld or %lld depending on platform to correctly format int64_t values.
| "The total number of elements in UnStack is %d, which exceeds the " | |
| "The total number of elements in UnStack is %lld, which exceeds the " |
|
|
||
| #pragma once | ||
|
|
||
| #include <iostream> |
Copilot
AI
Oct 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The <iostream> header is included but does not appear to be used in this file. This include should be removed unless it's needed for debugging purposes that were accidentally left in.
| #include <iostream> |
| RenormKernelFunc3<T><<<grid2, block2, 0, stream>>>( | ||
| dimension_each, dim_value_data, p, max_norm); |
Copilot
AI
Oct 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first parameter was changed from numel to dimension_each. This appears to be a logic change beyond just type conversion and should be verified as correct. If this fixes a bug, it should be documented in the commit message or comments.
| std::max({N, C, H, W}), | ||
| std::numeric_limits<int>::max(), | ||
| common::errors::InvalidArgument( | ||
| "One or more tensor dimensions (N=%ld, C=%ld, H=%ld, W=%ld) exceeds " |
Copilot
AI
Oct 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format specifier %ld is used for int64_t values. On some platforms (e.g., Windows), int64_t is long long not long, so %lld should be used instead, or use a portable format like %\" PRId64 \" from .
| axis_dim, | ||
| std::numeric_limits<int>::max(), | ||
| common::errors::InvalidArgument( | ||
| "The axis dimension (%ld) exceeds the maximum value that int can " |
Copilot
AI
Oct 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format specifier %ld is used for an int64_t value. This should use %lld or a portable format like %\" PRId64 \" from for cross-platform compatibility.
wanghuancoder
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
XiaoguangHu01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…els/impl (PaddlePaddle#76107) * Fix int32 overflow in svd_grad and conv kernel impl * fix
…els/impl (#76107) (#76276) * Fix int32 overflow in svd_grad and conv kernel impl * fix Co-authored-by: Zhan Rongrui <[email protected]>
PR Category
Operator Mechanism
PR Types
Bug fixes
Description
排查Paddle/paddle/phi/kernels/impl 目录下的可能存在的大tensor问题并进行修改,主要涉及以下操作:
1. elementwise_grad_kernel_impl.h (+8, -8)
int i→int64_t iint numel→int64_t numelint tid→int64_t tid,并修正计算方式避免溢出int x_index, y_index, ...→int64_t ...2. accuracy_check_kernel_impl.h (+11, -11)
int num→int64_t numunsigned int idx→int64_t idx,并修正计算方式int i→int64_t i3. isclose_kernel_impl.h (+7, -5)
static_cast避免blockIdx.x * blockDim.x的乘法溢出4. renorm_impl.h (+11, -7)
int grid→int64_t gridstd::min(grid, max_grid_dimx)numel到dimension_each5. unstack_kernel_impl.h (+16, -2)
int total_num→int64_t total_numint post→int64_t postint索引,所以添加了PADDLE_ENFORCE_LE确保元素数不超过 INT32_MAX6. kldiv_loss_grad_kernel_impl.h (+2, -2)
int n→int64_t n7. kldiv_loss_kernel_impl.h (+1, -1)
int batch_size→int64_t batch_size8. svdvals_grad_kernel_impl.h (+3, -3)
int batch_count→int64_t batch_count9. gumbel_softmax_kernel_impl.h (+14, -1)
int axis_dim→int64_t axis_dimint,添加了维度上限检查10. gumbel_softmax_grad_kernel_impl.h (+15, -1)
int axis_dim→int64_t axis_dim11. lrn_kernel_impl.h (+43, -12)
int N, C, H, W→int64_t N, C, H, W#include <algorithm>int,检查所有维度不超过 INT32_MAX12. frame_kernel_impl.h (+3, -2)
int n_frames→int64_t n_framesint seq_length→int64_t seq_length13. frame_grad_kernel_impl.h (+3, -2)
int n_frames→int64_t n_framesint seq_length→int64_t seq_length14. stft_kernel_impl.h (+2, -2)
int n_frames→int64_t n_framesint seq_length→int64_t seq_length15. stft_grad_kernel_impl.h (+2, -2)
int n_frames→int64_t n_framesint seq_length→int64_t seq_length16. fold_kernel_impl.h (+4, -4)
int batch_size→int64_t batch_sizeint input_planes→int64_t input_planes17. fold_grad_kernel_impl.h (+4, -4)
int batch_size→int64_t batch_sizeint input_planes→int64_t input_planes18. unfold_kernel_impl.h (+2, -2)
int batch_size→int64_t batch_size19. unfold_grad_kernel_impl.h (+2, -2)
int batch_size→int64_t batch_size20. lstm_kernel_impl.h (+2, -2)
int frame_size→int64_t frame_size21. lstsq_kernel_impl.h (+5, -2)
int m, n, nrhs→int64_t m, n, nrhs22. qr_grad_kernel_impl.h (+2, -2)
int m, n→int64_t m, n23. spectral_norm_grad_kernel_impl.h (+2, -2)
int h, w→int64_t h, w24. spectral_norm_kernel_impl.h (+4, -4)
int h, w→int64_t h, w25. svd_grad_kernel_impl.h (+11, -10)
int m, n, k→int64_t m, n, kint batch_count→int64_t batch_count26. conv_kernel_impl.h (+4, -4)
int batch_size→int64_t batch_sizeint64_t27. conv_grad_kernel_impl.h (+8, -8)
int batch_size→int64_t batch_sizeint64_tpcard-93269