Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Im2col implementation on CUDA + refactoring #6

Open
wants to merge 344 commits into
base: master
Choose a base branch
from

Conversation

steremma
Copy link

Goal

This PR implements Im2Col in CUDA in (what I consider) an optimal way in terms of performance. I achieve that by assigning one thread per output element. This means that threads do not share their write address and therefore no synchronization is required. They do share read address which is of course thread safe. I complement the new functionality with a complete testing suite to assert correctness.

Extra tasks

The tests within the CNN module suffer from extensive code duplication as the Reference and CPU versions do exactly the same thing (the CUDA one's would just worsen the issue). Instead I refactored the Im2Col one's using templated arguments: As a result the tests are now defined only once and called independently from each architecture using templates. This approach is also followed in the DNN module. If time allows, I plan to refactor all tests within the CNN module in a similar manner.

@steremma steremma requested a review from lmoneta as a code owner May 21, 2018 21:55
Copy link

@ashlaban ashlaban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I'd prefer a bit more documentation. Other than that, looks good to me.

}

/**
* @brief A helper for image operations that rearranges image regions into

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferred commenting style for ROOT:

////////////////////////////////////////////////////////////////////////////
/// \brief Short description
/// \param[in] varname Description of var.
///
/// [Expanded desc.]

return ((imgDim - fltDim + 2 * padding) / stride) + 1;
}

template<typename AFloat>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A documentation comment would be nice here as well. If the function is private it can still be helpful for maintenance to have a brief explanation. Description of the variables (+ any assuptions) and a short, high-level summary of the implemented alg. would be nice.

@@ -203,6 +203,61 @@ __device__ void ReduceSum(AFloat *result, AFloat * sdata)
__syncthreads();
}

__device__ int calculateDimension(int imgDim, int fltDim, int padding, int stride)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short documentation comment would be nice. In what context should I use this function?

dpiparo and others added 25 commits June 12, 2018 14:37

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal
Even though timeout existed, the script decided to call gtimeout on Linux - which does not exit.

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal
We had test failures in runtime nightlies such as this one:
https://epsft-jenkins.cern.ch/view/ROOT/job/root-nightly-runtime-cxxmodules/95/BUILDTYPE=Debug,COMPILER=gcc62,LABEL=slc6/testReport/junit/projectroot.roottest.root.math/smatrix/roottest_root_math_smatrix_testKalman/

Failures were due to what @pcanal commented in root-project#2135, that some so files in
roottest doesn't have external linkage. (It means that if you call
    dlopen(libfoo.so), linux kernel can't find dependency libraries and it
    emits "undefined symbol" error when they try to initialize global
    variables in libfoo.so but couldn't find symbol definition)
With pch, rootmap files were providing information about the depending library.

However we stopped generating rootmap files in root-project#2127 and that's why we
got these failures. To fix this issue, I implemented a callback to
TCling which gets called when DynamicLibraryManager fails. The callback
pass error message to TCling and it handles message if it contains "undefined error".

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal
…-project#2160)"

This reverts commit 011aa82.

This is a revert of revert. I reverted the first commit because adding
"." to prebuiltmodulepath was causing failure in runtime modules, but
now we're skipping "." in TCling::LazyFunctionCreatorAutoloadForModule so
doesn't matter even if we have ".".

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal
…ith clang:

COMPILER="ccache clang" gets lost in CMake; using ccache does not work as there is no ccache-wrapper for clang-3.9.
So just use clang-3.9 without ccache.

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal
... not when echoing what is going to be run.

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal
SimeonEhrig and others added 29 commits June 27, 2018 11:09

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal
…MaxPoolingLayer`

* Pooling is now a subclass of Convolutional Layer. As a result common functions and fields are not replicated.
* Constructor arguments that can be internally computed are eliminated.

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal

Verified

This commit was signed with the committer’s verified signature.
micimize Michael Joseph Rosenthal
…frame". This is important to have the same naming convention everywhere.
This was an unintended side-effect of a previous commit:
9b4d0d8.
Instead of assigning rule that do not correspond to any specific branches
(for example rules setting transients or whole objects) to an hypothetical
'previous' branch, we switch to assign those orphan rules to either the
non-data bearing branches (split single objects or bases) or the collection
node (since the splitting completely flatten the hiearchy in this case).

This requires enhancing the list of IDs from a single list of element index
in the 'sole' current StreamerInfo to a nested list of elements that carries
along the sub-StreamerInfo information.
This is need to be able to distinguish split node from unsplit node.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.