-
Notifications
You must be signed in to change notification settings - Fork 90
ViennaCL Roadmap
- Stabilize existing (experimental or OpenCL-only) functionality (SPAI, FFT, etc.)
- Rearrange test suite to reduce overall compilation times
- Get rid of Boost as much as possible/reasonable
- Bugfixes as needed
According to our versioning policy, updates to the 1.7.x branch will not change the public API, only fix bugs and improve performance.
- No internal use of Boost.uBLAS
- Better support for hybrid architectures (AMD APUs, etc.). More hardware of that kind will hit the market...
- Fast dense solver (LU factorization) with pivoting
Currently there are no plans for a ViennaCL 1.9.0 release.
A couple of interface changes are necessary for ViennaCL 2.0.0 to allow for high performance and flexibility. The following items are in discussion:
The current header-only approach in C++ has already been stretched beyond its limits. Most importantly, it prohibits adoption by other languages and results in significant compilation times. A shared C library with minor object-oriented features with tutorials on how to e.g. call from FORTRAN will open ViennaCL up to a new (large) user base.
Making the alignment of vectors/matrices in ViennaCL 1.x.y a template parameter was a design error, as the optimal padding is not known until runtime (device-specific). For matrices it is even necessary to distinguish row- and column-padding.
C++ templates become pretty heavy for the compiler. Also, some implementations could be made more compact if the arithmetic type is not fixed at compile time. Will reduce compilation times quite substantially.
Presumably no MPI, but using shared memory directly. There is a lot of software out there providing MPI across compute nodes, but they struggle with many processes per node. If we can fully exploit a single node (or NUMA domain), this would help these other packages a lot.