Im2col implementation on CUDA + refactoring#6
Open
steremma wants to merge 344 commits intotmvadnn:masterfrom
Open
Im2col implementation on CUDA + refactoring#6steremma wants to merge 344 commits intotmvadnn:masterfrom
steremma wants to merge 344 commits intotmvadnn:masterfrom
Conversation
ashlaban
reviewed
May 22, 2018
ashlaban
left a comment
There was a problem hiding this comment.
Nice work! I'd prefer a bit more documentation. Other than that, looks good to me.
| } | ||
|
|
||
| /** | ||
| * @brief A helper for image operations that rearranges image regions into |
There was a problem hiding this comment.
Preferred commenting style for ROOT:
////////////////////////////////////////////////////////////////////////////
/// \brief Short description
/// \param[in] varname Description of var.
///
/// [Expanded desc.]
| return ((imgDim - fltDim + 2 * padding) / stride) + 1; | ||
| } | ||
|
|
||
| template<typename AFloat> |
There was a problem hiding this comment.
A documentation comment would be nice here as well. If the function is private it can still be helpful for maintenance to have a brief explanation. Description of the variables (+ any assuptions) and a short, high-level summary of the implemented alg. would be nice.
| __syncthreads(); | ||
| } | ||
|
|
||
| __device__ int calculateDimension(int imgDim, int fltDim, int padding, int stride) |
There was a problem hiding this comment.
Short documentation comment would be nice. In what context should I use this function?
Even though timeout existed, the script decided to call gtimeout on Linux - which does not exit.
We had test failures in runtime nightlies such as this one: https://epsft-jenkins.cern.ch/view/ROOT/job/root-nightly-runtime-cxxmodules/95/BUILDTYPE=Debug,COMPILER=gcc62,LABEL=slc6/testReport/junit/projectroot.roottest.root.math/smatrix/roottest_root_math_smatrix_testKalman/ Failures were due to what @pcanal commented in root-project#2135, that some so files in roottest doesn't have external linkage. (It means that if you call dlopen(libfoo.so), linux kernel can't find dependency libraries and it emits "undefined symbol" error when they try to initialize global variables in libfoo.so but couldn't find symbol definition) With pch, rootmap files were providing information about the depending library. However we stopped generating rootmap files in root-project#2127 and that's why we got these failures. To fix this issue, I implemented a callback to TCling which gets called when DynamicLibraryManager fails. The callback pass error message to TCling and it handles message if it contains "undefined error".
…-project#2160)" This reverts commit 011aa82. This is a revert of revert. I reverted the first commit because adding "." to prebuiltmodulepath was causing failure in runtime modules, but now we're skipping "." in TCling::LazyFunctionCreatorAutoloadForModule so doesn't matter even if we have ".".
…ith clang: COMPILER="ccache clang" gets lost in CMake; using ccache does not work as there is no ccache-wrapper for clang-3.9. So just use clang-3.9 without ccache.
... not when echoing what is going to be run.
…MaxPoolingLayer` * Pooling is now a subclass of Convolutional Layer. As a result common functions and fields are not replicated. * Constructor arguments that can be internally computed are eliminated.
…frame". This is important to have the same naming convention everywhere.
This was an unintended side-effect of a previous commit: 9b4d0d8.
Addresses ROOT-9311
Instead of assigning rule that do not correspond to any specific branches (for example rules setting transients or whole objects) to an hypothetical 'previous' branch, we switch to assign those orphan rules to either the non-data bearing branches (split single objects or bases) or the collection node (since the splitting completely flatten the hiearchy in this case). This requires enhancing the list of IDs from a single list of element index in the 'sole' current StreamerInfo to a nested list of elements that carries along the sub-StreamerInfo information.
This is need to be able to distinguish split node from unsplit node.
…ate code duplication
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Goal
This PR implements
Im2Colin CUDA in (what I consider) an optimal way in terms of performance. I achieve that by assigning one thread per output element. This means that threads do not share their write address and therefore no synchronization is required. They do share read address which is of course thread safe. I complement the new functionality with a complete testing suite to assert correctness.Extra tasks
The tests within the
CNNmodule suffer from extensive code duplication as theReferenceandCPUversions do exactly the same thing (the CUDA one's would just worsen the issue). Instead I refactored theIm2Colone's using templated arguments: As a result the tests are now defined only once and called independently from each architecture using templates. This approach is also followed in theDNNmodule. If time allows, I plan to refactor all tests within theCNNmodule in a similar manner.