-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Im2col implementation on CUDA + refactoring #6
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I'd prefer a bit more documentation. Other than that, looks good to me.
} | ||
|
||
/** | ||
* @brief A helper for image operations that rearranges image regions into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preferred commenting style for ROOT:
////////////////////////////////////////////////////////////////////////////
/// \brief Short description
/// \param[in] varname Description of var.
///
/// [Expanded desc.]
return ((imgDim - fltDim + 2 * padding) / stride) + 1; | ||
} | ||
|
||
template<typename AFloat> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A documentation comment would be nice here as well. If the function is private it can still be helpful for maintenance to have a brief explanation. Description of the variables (+ any assuptions) and a short, high-level summary of the implemented alg. would be nice.
@@ -203,6 +203,61 @@ __device__ void ReduceSum(AFloat *result, AFloat * sdata) | |||
__syncthreads(); | |||
} | |||
|
|||
__device__ int calculateDimension(int imgDim, int fltDim, int padding, int stride) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short documentation comment would be nice. In what context should I use this function?
Even though timeout existed, the script decided to call gtimeout on Linux - which does not exit.
We had test failures in runtime nightlies such as this one: https://epsft-jenkins.cern.ch/view/ROOT/job/root-nightly-runtime-cxxmodules/95/BUILDTYPE=Debug,COMPILER=gcc62,LABEL=slc6/testReport/junit/projectroot.roottest.root.math/smatrix/roottest_root_math_smatrix_testKalman/ Failures were due to what @pcanal commented in root-project#2135, that some so files in roottest doesn't have external linkage. (It means that if you call dlopen(libfoo.so), linux kernel can't find dependency libraries and it emits "undefined symbol" error when they try to initialize global variables in libfoo.so but couldn't find symbol definition) With pch, rootmap files were providing information about the depending library. However we stopped generating rootmap files in root-project#2127 and that's why we got these failures. To fix this issue, I implemented a callback to TCling which gets called when DynamicLibraryManager fails. The callback pass error message to TCling and it handles message if it contains "undefined error".
…-project#2160)" This reverts commit 011aa82. This is a revert of revert. I reverted the first commit because adding "." to prebuiltmodulepath was causing failure in runtime modules, but now we're skipping "." in TCling::LazyFunctionCreatorAutoloadForModule so doesn't matter even if we have ".".
…ith clang: COMPILER="ccache clang" gets lost in CMake; using ccache does not work as there is no ccache-wrapper for clang-3.9. So just use clang-3.9 without ccache.
... not when echoing what is going to be run.
…MaxPoolingLayer` * Pooling is now a subclass of Convolutional Layer. As a result common functions and fields are not replicated. * Constructor arguments that can be internally computed are eliminated.
…frame". This is important to have the same naming convention everywhere.
This was an unintended side-effect of a previous commit: 9b4d0d8.
Addresses ROOT-9311
Instead of assigning rule that do not correspond to any specific branches (for example rules setting transients or whole objects) to an hypothetical 'previous' branch, we switch to assign those orphan rules to either the non-data bearing branches (split single objects or bases) or the collection node (since the splitting completely flatten the hiearchy in this case). This requires enhancing the list of IDs from a single list of element index in the 'sole' current StreamerInfo to a nested list of elements that carries along the sub-StreamerInfo information.
This is need to be able to distinguish split node from unsplit node.
…ate code duplication
Goal
This PR implements
Im2Col
in CUDA in (what I consider) an optimal way in terms of performance. I achieve that by assigning one thread per output element. This means that threads do not share their write address and therefore no synchronization is required. They do share read address which is of course thread safe. I complement the new functionality with a complete testing suite to assert correctness.Extra tasks
The tests within the
CNN
module suffer from extensive code duplication as theReference
andCPU
versions do exactly the same thing (the CUDA one's would just worsen the issue). Instead I refactored theIm2Col
one's using templated arguments: As a result the tests are now defined only once and called independently from each architecture using templates. This approach is also followed in theDNN
module. If time allows, I plan to refactor all tests within theCNN
module in a similar manner.