Textures being created with incorrect sizeInBytes #210

torrance · 2023-06-09T08:33:57Z

The test_linalg/test_matmul_ab_beamformer_kernel_small test adds a misalign parameter to test linalg_kernels.cu handling of memory not aligned to 512 byte boundaries:

bifrost/test/test_linalg.py

Lines 224 to 231 in 30fd268

    
           def test_matmul_aa_correlator_kernel_small(self): 
        
               for nchan in range(1, 1+5): 
        
                   for ntime in [1, 2, 3, 4, 8, 12]: 
        
                       for nstand in range(1, 1+65): 
        
                           for misalign in range(0, min(2 * (nstand - 1), 3), 2): 
        
                               self.run_test_matmul_aa_correlator_kernel( 
        
                                   ntime=ntime, nstand=nstand, nchan=nchan, 
        
                                   misalign=misalign)

For example, on the third innermost iteration, this tests passes in the values (nchan = 1, ntime = 1, nstand = 3, misalign = 2). This produces an array with size (1, 1, 6), which is then truncated (using the misalign value) down to a (1, 1, 4).

After calling linalg.matmul, this eventually reaches the function bf_cherk_N() where the total elements in the array needs to be calculated:

bifrost/src/linalg_kernels.cu

Lines 539 to 540 in 30fd268

    
           size_t A_nelement_total = 
        
           	std::max(A_stride * K, A_batchstride * nbatch) + A_offset;

For this particular test, the n_element_total should be 2 (size of truncated array) + 1 (offset), however it calculates it as 3 + 1 = 4.

When this incorrect, larger size is later used to calculate the sizeInBytes of the texture, this results in the texture believe it has access to more memory than it really does. On CUDA backends this isn't actually validated, but on ROCM backends, the texture creation fails since it actually checks the backing memory.

I've spent some time with this but haven't been able to fix it, as I'm not clear what all of N, K, A_stride, nbatch, etc. are, or where they are meant to be coming from.

The text was updated successfully, but these errors were encountered:

jaycedowell · 2023-06-12T20:14:04Z

I'll try to find some time this week to look into this.

jaycedowell · 2023-06-15T01:09:48Z

I spent some time looking into this and I think I know what is happening for this particular case. I think the underlying problem is how Bifrost computes array sizes. By default it uses the product of the first array stride length with the size of the first array dimension to get the overall array size. In Python that would be:

from bifrost import ndarray
a = ndarray(shape=(1,1,6), dtype='ci8')
print(a.size*a.itemsize, 'vs', a.strides[0]*a.shape[0])

which looks ok. When misalign is non-zero we end up with:

b = a[...,2:]
print(b.size*b.itemsize, 'vs', b.strides[0]*b.shape[0])

which doesn't match. The misalign=2 causes the start address for the underlying memory and the shape to be updated but not the strides.

print(a.ctypes.data, 'vs', b.ctypes.data)
print(a.shape, 'vs', b.shape)
print(a.strides, 'vs', b.strides)

In this case the correct array size would come from something like numpy.prod(b.shape)*b.itemsize since both of those should be available in the BFarray class/struct.

I'm less sure what to do about it. In this particular case changing the calculation of A_nelement_total to use shape (somehow) would seem like the thing do to. However, I doubt that is a general solution since A_nelement_total may be only relevant for the batched size of the operation. Thinking about what is happening here we are trying to do c = α * a.a^H + β*c. Here a is a matrix with shape¹(nbatch, N, K) and c has shape (nbatch, N, N). That would make nbatch the number of independent pieces of a.a^H to run, N sets the number of elements in the last two dimensions of c, and K is the number of elements to sum over each element in c for the matrix multiplication. I think that makes A_nelement_total the maximum of "the distance between rows of a" and "the distance between batched sections of a" added to "the offset between the start of a and the texture alignment"?

The shape is technically (nbatch, K, N) because the original LinAlg.matmul() call puts a in the location of b which says what we are passing in is the Hermitian conjugate of a. ↩

torrance · 2023-06-16T07:20:38Z

Thank you for looking at this @jaycedowell . I'll re-read your comment a few times and attempt to have another look at it next week.

torrance · 2023-06-21T05:30:09Z

Hi @jaycedowell

I think the following gives the correct value for A_nelement_total:

size_t A_nelement_total =
    (A_offset + N)                  // the elements in the first row of first batch
    + (K - 1) * A_stride            // the elements in the rest of the first batch
    + (nbatch - 1) * A_batchstride; // the elements for the remaining batches

The tests pass and I no longer get memory violations in ROCM.

Do you think the logic works now for all use cases?

jaycedowell · 2023-06-30T19:19:39Z

I'm not sure that A_nelement_total should include elements from other batches (that last term in the sum). For the batched execution you are only really concerned with those elements that are used for that particular batch. Each batch should handle its own texture binding.

torrance · 2023-07-03T03:11:18Z

I think it needs to be sized to store all the batches. The texture isn't created once per batch, it's shared amongst all of them, from my reading of the code.

In any case, if I delete the last term in the sum the tests will fail again.

Apply fix for #210

jaycedowell mentioned this issue Jun 16, 2023

Intermittent bifrost.linalg test failures #187

Open

jaycedowell mentioned this issue Dec 7, 2023

Update linalg algos #224

Merged

dentalfloss1 added a commit that referenced this issue Dec 7, 2023

Update linalg_kernels.cu

af885c0

Apply fix for #210

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Textures being created with incorrect sizeInBytes #210

Textures being created with incorrect sizeInBytes #210

torrance commented Jun 9, 2023 •

edited

Loading

jaycedowell commented Jun 12, 2023

jaycedowell commented Jun 15, 2023

torrance commented Jun 16, 2023

torrance commented Jun 21, 2023 •

edited

Loading

jaycedowell commented Jun 30, 2023

torrance commented Jul 3, 2023 •

edited

Loading

Textures being created with incorrect sizeInBytes #210

Textures being created with incorrect sizeInBytes #210

Comments

torrance commented Jun 9, 2023 • edited Loading

jaycedowell commented Jun 12, 2023

jaycedowell commented Jun 15, 2023

Footnotes

torrance commented Jun 16, 2023

torrance commented Jun 21, 2023 • edited Loading

jaycedowell commented Jun 30, 2023

torrance commented Jul 3, 2023 • edited Loading

torrance commented Jun 9, 2023 •

edited

Loading

torrance commented Jun 21, 2023 •

edited

Loading

torrance commented Jul 3, 2023 •

edited

Loading