- profile GPU transfers (nvprof) to see whether we're actually avoiding copies
- track statistics of sizes of objects
- modify dataflow and objtrace scripts
- feature: select CPU BLAS library dynamically where it can benefit performance
- collect examples
- nwchem samples from repo
- mpb samples from repo
- meep samples from repo (v1.6)
- collect performance info
- nwchem
- Octave
- cp2k
- mpb
- meep
- implement OpenCL backend
- abstract away CUDA/OpenCL runtimes
- abstract away cublas/clblast BLAS runtimes
- pass all netlib tests for level 3
- Intel i7 hang: look into what happens when kernel
cl_mem
buffers are null
- Intel i7 hang: look into what happens when kernel