Skip to content

Conversation

@gsitaram
Copy link

@gsitaram gsitaram commented Oct 1, 2022

Hi @Luke20000429, @joydddd, you can use this simple standalone to measure the bandwidth achieved over PCIe and compare and contrast transfers of

  • 1 large buffer vs multiple small buffers
  • using pinned memory vs pageable memory
  • using hipMemcpy vs hipMemcpyAsync
    There is a convenient run script that you can use to tune your sweep over various parameter values.

My conclusions are the following:

  • The performance gets close to peak and is the same whether you transfer a large buffer of size 128MB or 16 small buffers of size 8MB.
  • Using pinned memory is better even for hipMemcpy
  • The performance of hipMemcpyAsync seems to be better even if we just transfer one time (i.e., iter=1)
  • Performance fluctuates when we test on the GPU in our workstation, it is more stable when testing a GPU on a server.

YMMV, so it is best to test on your end with the cards you have access to.

@ooreilly
Copy link

Is this code relevant for ksw2? I don't see any dependencies on ksw2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants