Im2col implementation on CUDA + refactoring #6

steremma · 2018-05-21T21:55:40Z

Goal

This PR implements Im2Col in CUDA in (what I consider) an optimal way in terms of performance. I achieve that by assigning one thread per output element. This means that threads do not share their write address and therefore no synchronization is required. They do share read address which is of course thread safe. I complement the new functionality with a complete testing suite to assert correctness.

Extra tasks

The tests within the CNN module suffer from extensive code duplication as the Reference and CPU versions do exactly the same thing (the CUDA one's would just worsen the issue). Instead I refactored the Im2Col one's using templated arguments: As a result the tests are now defined only once and called independently from each architecture using templates. This approach is also followed in the DNN module. If time allows, I plan to refactor all tests within the CNN module in a similar manner.

ashlaban

Nice work! I'd prefer a bit more documentation. Other than that, looks good to me.

ashlaban · 2018-05-22T13:13:26Z

tmva/tmva/src/DNN/Architectures/Cuda/Propagation.cu

+}
+
+/**
+ * @brief A helper for image operations that rearranges image regions into


Preferred commenting style for ROOT:

//////////////////////////////////////////////////////////////////////////// /// \brief Short description /// \param[in] varname Description of var. /// /// [Expanded desc.]

ashlaban · 2018-05-22T13:15:14Z

tmva/tmva/src/DNN/Architectures/Cuda/Kernels.cuh

+   return ((imgDim - fltDim + 2 * padding) / stride) + 1;
+}
+
+template<typename AFloat>


A documentation comment would be nice here as well. If the function is private it can still be helpful for maintenance to have a brief explanation. Description of the variables (+ any assuptions) and a short, high-level summary of the implemented alg. would be nice.

ashlaban · 2018-05-22T13:16:40Z

tmva/tmva/src/DNN/Architectures/Cuda/Kernels.cuh

@@ -203,6 +203,61 @@ __device__ void ReduceSum(AFloat *result, AFloat * sdata)
   __syncthreads();
 }

+__device__ int calculateDimension(int imgDim, int fltDim, int padding, int stride)


Short documentation comment would be nice. In what context should I use this function?

This fixes https://sft.its.cern.ch/jira/browse/ROOT-9444.

Even though timeout existed, the script decided to call gtimeout on Linux - which does not exit.

See travis-ci/travis-ci#8826

@pcanal

We had test failures in runtime nightlies such as this one: https://epsft-jenkins.cern.ch/view/ROOT/job/root-nightly-runtime-cxxmodules/95/BUILDTYPE=Debug,COMPILER=gcc62,LABEL=slc6/testReport/junit/projectroot.roottest.root.math/smatrix/roottest_root_math_smatrix_testKalman/ Failures were due to what @pcanal commented in root-project#2135, that some so files in roottest doesn't have external linkage. (It means that if you call dlopen(libfoo.so), linux kernel can't find dependency libraries and it emits "undefined symbol" error when they try to initialize global variables in libfoo.so but couldn't find symbol definition) With pch, rootmap files were providing information about the depending library. However we stopped generating rootmap files in root-project#2127 and that's why we got these failures. To fix this issue, I implemented a callback to TCling which gets called when DynamicLibraryManager fails. The callback pass error message to TCling and it handles message if it contains "undefined error".

…-project#2160)" This reverts commit 011aa82. This is a revert of revert. I reverted the first commit because adding "." to prebuiltmodulepath was causing failure in runtime modules, but now we're skipping "." in TCling::LazyFunctionCreatorAutoloadForModule so doesn't matter even if we have ".".

…ith clang: COMPILER="ccache clang" gets lost in CMake; using ccache does not work as there is no ccache-wrapper for clang-3.9. So just use clang-3.9 without ccache.

... not when echoing what is going to be run.

…MaxPoolingLayer` * Pooling is now a subclass of Convolutional Layer. As a result common functions and fields are not replicated. * Constructor arguments that can be internally computed are eliminated.

…frame". This is important to have the same naming convention everywhere.

This was an unintended side-effect of a previous commit: 9b4d0d8.

Addresses ROOT-9311

Instead of assigning rule that do not correspond to any specific branches (for example rules setting transients or whole objects) to an hypothetical 'previous' branch, we switch to assign those orphan rules to either the non-data bearing branches (split single objects or bases) or the collection node (since the splitting completely flatten the hiearchy in this case). This requires enhancing the list of IDs from a single list of element index in the 'sole' current StreamerInfo to a nested list of elements that carries along the sub-StreamerInfo information.

This is need to be able to distinguish split node from unsplit node.

…ate code duplication

…ention guide

… checks

steremma requested a review from lmoneta as a code owner May 21, 2018 21:55

ashlaban reviewed May 22, 2018

View reviewed changes

steremma mentioned this pull request May 25, 2018

Reshape Layer implementation for the GPU Architecture #8

Open

steremma closed this May 28, 2018

steremma reopened this May 28, 2018

dpiparo and others added 25 commits June 12, 2018 14:37

[RN] Add notes about the hashing of streamer infos

70b0923

cling travis: indent of comments.

f17856e

TMVA/DNN/Architectures/Cpu:CpuMatrix.h: Disable debugging macro.

3f42e76

This fixes https://sft.its.cern.ch/jira/browse/ROOT-9444.

Avoid warnings when executed in sequence via gldemos.C

63e1800

cling travis: remove leading "@"; no idea why the example had it.

9fb7900

cling travis: help find timeout:

600b3b8

Even though timeout existed, the script decided to call gtimeout on Linux - which does not exit.

cling travis: Install coreutils; Trusty has no "timeout" apt package.

ee68d37

cling travis: oclint is in the way for brew install gcc@7.

4ec6b07

See travis-ci/travis-ci#8826

cling travis: remove stray "--overwrite".

6e86caa

cling travis: add "compiler" tag to GCC-7@Mac build.

b0875a2

cling travis: travis_fold:begin: is spelled travis_fold:start:.

a52cb9d

Avoid getting confused by out-of-range request to TTreeCache

b646b9b

v6.16 RelNotes: enter date, release number; update URL.

2ba50d5

RelNotes template: use .cern TLD for url.

e9d5c9a

cling travis: travis merged two lines into one command.

b101e35

cling travis: move cpt log fold into cpt.py.

e9542e8

cling travis: teach travis to do two brew tasks. Again.

396425d

cling travis: specify osx image. Fix compiler name for a build.

e0e1c1b

cling travis: convince ccache to kick in.

c45dd40

cling travis: CMake and travis are uncooperative, cannot use ccache w…

954ec8d

…ith clang: COMPILER="ccache clang" gets lost in CMake; using ccache does not work as there is no ccache-wrapper for clang-3.9. So just use clang-3.9 without ccache.

cling travis: create a new log section when *running*...

11bf345

... not when echoing what is going to be run.

Fix typo

b5279b3

Fix undefined variable with ROOT 6

904f595

SimeonEhrig and others added 29 commits June 27, 2018 11:09

Fix doxygen error.

8042557

Include test directory using ROOT_ADD_TEST_SUBDIRECTORY

a65eabc

Simplified the APIs and reduced code duplication in ConvLayer and `…

61cd582

…MaxPoolingLayer` * Pooling is now a subclass of Convolutional Layer. As a result common functions and fields are not replicated. * Constructor arguments that can be internally computed are eliminated.

Adapted DeepNet to the new API

ed3fea9

Renamed kernel/filter fields to use the prefix "filter" rather than "…

95001ca

…frame". This is important to have the same naming convention everywhere.

Code review corrections

0a7faec

Do not exclude etc/http directory from installation

bed1568

This was an unintended side-effect of a previous commit: 9b4d0d8.

netxng: add possibility to control log verbosity with NetXNG.Debug

bc20f50

Addresses ROOT-9311

netxng: give priority to Xrd_LOGLEVEL over NetXNG.Debug

7773fc7

Avoid double write

a220f70

Warn about problematic base class StreamerElement only once

d87111e

Properly set fOnfileObject for the parent

75724e0

Use same normalization for branch, element name and later searches.

38ffbe6

Insure we call SetRead/FillSequence after sub-branches have been added.

babece3

This is need to be able to distinguish split node from unsplit node.

Remove obsolete (commented-out) code fragement

2e6c1b2

Factor out ActionSequence creation

db24ec7

Apply new ActionSequence creation to Fill

5ef1ca0

Remove obsolete TBranchElement::fIDs

424e307

Update RDF code owners

ea26e97

CUDA implementation of Im2Col

6646031

Refactored Im2Col tests - now taking advantage of templates to elimin…

b336d96

…ate code duplication

Added CUDA specialization for the existing test suite

1277f7c

Include CUDA tests into CMake

cbfa0bb

Added more documentation as per code review comments

5efe85f

Final documentation changes in accordance with the official ROOT conv…

252268a

…ention guide

fix a typo

39bf5f8

Fixed tests, now returning error code in case of failure to enable CI…

aa16408

… checks

Fixed an indexing bug

00a62a7

steremma force-pushed the im2col branch from 97db883 to 00a62a7 Compare June 28, 2018 11:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Im2col implementation on CUDA + refactoring #6

Im2col implementation on CUDA + refactoring #6

Uh oh!

steremma commented May 21, 2018

Uh oh!

ashlaban left a comment

Uh oh!

ashlaban May 22, 2018

Uh oh!

ashlaban May 22, 2018

Uh oh!

ashlaban May 22, 2018

Uh oh!

Uh oh!

Im2col implementation on CUDA + refactoring #6

Are you sure you want to change the base?

Im2col implementation on CUDA + refactoring #6

Uh oh!

Conversation

steremma commented May 21, 2018

Goal

Extra tasks

Uh oh!

ashlaban left a comment

Choose a reason for hiding this comment

Uh oh!

ashlaban May 22, 2018

Choose a reason for hiding this comment

Uh oh!

ashlaban May 22, 2018

Choose a reason for hiding this comment

Uh oh!

ashlaban May 22, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!