Skip to content
Karl Rupp edited this page Aug 1, 2015 · 9 revisions

ViennaCL 1.7.x, x > 0

Planned features

  • Stabilize existing (experimental or OpenCL-only) functionality (SPAI, FFT, etc.)
  • Rearrange test suite to reduce overall compilation times
  • Get rid of Boost as much as possible/reasonable
  • Bugfixes as needed

According to our versioning policy, updates to the 1.7.x branch will not change the public API, only fix bugs and improve performance.

ViennaCL 1.8.x

Planned features

  • No internal use of Boost.uBLAS
  • Better support for hybrid architectures (AMD APUs, etc.). More hardware of that kind will hit the market...
  • Fast dense solver (LU factorization) with pivoting

Currently there are no plans for a ViennaCL 1.9.0 release.

ViennaCL 2.0.0

A couple of interface changes are necessary for ViennaCL 2.0.0 to allow for high performance and flexibility. The following items are in discussion:

Shared C library

The current header-only approach in C++ has already been stretched beyond its limits. Most importantly, it prohibits adoption by other languages and results in significant compilation times. A shared C library with minor object-oriented features with tutorials on how to e.g. call from FORTRAN will open ViennaCL up to a new (large) user base.

Alignment

Making the alignment of vectors/matrices in ViennaCL 1.x.y a template parameter was a design error, as the optimal padding is not known until runtime (device-specific). For matrices it is even necessary to distinguish row- and column-padding.

Make the NumericT a runtime parameter?

C++ templates become pretty heavy for the compiler. Also, some implementations could be made more compact if the arithmetic type is not fixed at compile time. Will reduce compilation times quite substantially.

Distributed vectors and matrices (multiple devices, possibly mixed CUDA/OpenCL/OpenMP).

Presumably no MPI, but using shared memory directly. There is a lot of software out there providing MPI across compute nodes, but they struggle with many processes per node. If we can fully exploit a single node (or NUMA domain), this would help these other packages a lot.