-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
https://github.com/leimao/CUDA-GEMM-Optimization/blob/main/include/profile_utils.cuh#L238
Why is it better to use int here?
https://github.com/leimao/CUDA-GEMM-Optimization/blob/main/src/profile_cuda_gemm_fp16.cu#L14
Why is the accuracy used here so low? I see that the accuracy in apex and torch.test is relatively high.
https://pytorch.org/docs/stable/testing.html
https://github.com/NVIDIA/apex/blob/master/tests/L0/run_fused_layer_norm/test_fused_layer_norm.py#L164
When I implemented gemm, I found that it was difficult to match accuracy with cublas. Why? 😢
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels